Elementary telepathy
I walk through a simple thought experiment and related questions about recreating the visual experience between one macaque in another inspired by face decoding experiments conducted in the Tsao Lab. I add an additional thought experiment at the end about how to use this method to infer the average face someone has seen. I wrote an explanation of the Tsao Lab’s work on object representation here — probably necessary reading to understand what follows.
Intro and motivation
I’m interested in methods to compare representations/experiences between biological brains and the potential limitations of these approaches.The visual system is well studied, and there’s a pretty workable reference point for reasoning — we can compare multiple self-reports to an external representation (an image), and use this external representation to aid in decoding neural activity. Even though there are individual differences in visual perception, we all develop visual experience through interaction with environments governed by the same physical laws.1
More so than the example below, I’m interested in what an explanation of the limits of comparison could look like, at least with the approach I use — the approach being that you learn what neurons code for your percept of interest and how in subject a and then find the correspondence in subject b and perform some transformation to recreate that percept. For example, I’m skeptical that you could recreate subject a’s experience in b if b has no context for a’s experience. 2
Building a translator
Chang and Tsao 2017 decode faces a macaque is viewing. This implies we could (in principle) instantiate arbitrary faces in the macaque’s visual field through stimulating face patch neurons, given sufficiently advanced neurotech.
If we have the information we need to decode faces from neural activity, can we then use the neural activity encoding a face in one macaque and instantiate the same face in another through neurostimulation?
Assumptions:
The neurons we’re stimulating are causally relevant for face perception
Neurotech that can write specific firing rates to hundreds of individual neurons is available (in research land where this type of neuromodulation actually happens the search term would not be “neurotech” but something like “holographic optogenetics“)
The experiment will involve 2 macaques (A and B), the presentation of face images, and neural recording and stimulation technology.
The easiest, and presumably impossible, case would be if macaques A and B had identical neuroanatomy and prior training data (they had been exposed to all the same faces with the same frequency). Then they may have functionally interchangeable STA axes and origins in their face spaces, meaning A and B’s neurons respond the same to faces.
In this case, you’d just need to stimulate the corresponding neurons the same way in macaque B that you recorded from macaque A after presenting a face to it. This would also be functionally the same as just playing back A’s activity to itself.
For this thought experiment, we will treat the principal components as conserved since they were in previous face patch work from the Tsao lab and assume we have access to previous neural data from macaque B such that we know to which features each neuron responds. We’ll also assume that macaques had the same training data since they did in Chang and Tsao 2017.
But assuming A and B have different neuroanatomy, which I would, building a translator would require a few extra steps:
First, you’d determine the feature values of each facial feature that macaque A viewed.
The feature values would need to be translated to their corresponding spike rates for a neuron that would display ramp-shaped tuning in response to their variation.
Each neuron’s activity represents the weighted average of values of all the features it encodes for. So, the spike rates (from step 2) for a set of features that a given neuron responds to would be averaged to get the spike rate needed to stimulate that neuron i.e. (spike rate(feature 1) + … spike rate(feature 6)) / 6.
Stimulate the corresponding neurons with their respective spike rates in macaque B.
To return to the image from the beginning, the “translator” part would be doing steps 1-3, and the methods to do so come from Chang and Tsao.
How do you know if it worked?
Let’s say you built your translator and are pretty confident that you can translate face responses between macaques. How do you know if it actually worked? Fwiw I’m pretty interested in the general form of this question— how do we study perception in cases without self report?
You could use a behavioral reward system where macaque B is first trained to respond to the presentation of the face you’re translating between macaques. Then when you stimulate face patch neurons in macaque B, if it presents that response, you have some evidence that macaque B may be seeing that face. I don’t love this solution because ideally macaque B has never seen the face being translated, but it could be the best option given the epistemic constraints.
Another option, which would be much much harder, is that you could mimic the no-report paradigm from Hesse and Tsao 2020 where a macaque is trained to track a fixation spot that jumps around the image. This would require making a more complex stimulation protocol that includes the visual experience of the moving fixation spot. But if you did observe the macaque moving its eyes in sync with the fixation spot, you’d at least have the same level of certainty as demonstrated in prior work that the macaque was viewing the face or at the very least the fixation spot.
I’m not sure you’d ever actually know if it worked. Even if you translate the face image to yourself and check on your own, you’re just inverting the translation operation, so you wouldn’t know if it’s correct.
Redundancy in face patch neurons
Chang and Tsao record ~200 adjacent neurons across 3 patches to decode faces, yet this is a tiny fraction of total face patch neurons. To give a rough estimate of total face patch neurons: The ML face patch, one of the regions they recorded from, is estimated to be about 4mm in diameter so 33.5 mm^3. The macaque neocortex averages about 160,000 neurons/mm^3. Chang and Tsao were recording from 3 face patches (out of 6 total in macaques) so 200 neurons from a region of interest with about 16,080,000 neurons (160,000 x 33.5 x 3) is about 0.00001% of face patch neurons!3
Is this tractable?
It seems unlikely that stimulating 200 neurons in a sea of millions involved in the representation of facial features would be casually relevant enough to dramatically change perception. Is it reasonable to think you’d need to stimulate at least half of the relevant neurons that code for facial features? — potentially stimulating over 15 million neurons (based on estimates of total face patch neurons) that you’d have to write very specific firing rates to?
In stark contrast to my intuition, I’ve talked to a couple of neuroscience PIs who suspect because of dense recurrence in these areas you may only need to stimulate <1000 neurons or even <100 to drive meaningful perceptual shifts. Spectacularly, it seems like neuroscientists have been able to alter visual perception in mice by stimulating as few as 2 neurons! These results were for generating percepts corresponding to horizontal or vertical gratings, and it’s unclear how vivid or stable these percepts were. A potential counterpoint— Manley et al. 2024 record from nearly 1M neurons at cellular resolution in the mouse dorsal cortex, using light beads microscopy, and find an unbounded scaling of dimensionality.4
That said, as a working BCI, it may be difficult to place your stimulation tool exactly where these casually significant neurons are — assuming that for different percepts they’re more widely distributed than the reach of your BCI. I’d assume to recreate a face perception, for example, you’d need an optical BCI that can get single neuron read/write to specific firing frequencies which would have a pretty limited depth (500-1000 microns if you’re using 2-photon).
Representational drift
If we stick to the assumptions of this experiment, eventual representational drift would also make translation incredibly difficult. Even if you understand every neuron’s tuning function and could write to every relevant neuron to instantiate some percept, I wouldn’t expect those tuning functions to be stable.
What individual neurons respond to changes over time, known as representational drift. Memory engrams move around and neurons that code for components of sensory percepts (like facial features in the IT cortex) change their tuning functions. To make single neuron stimulation for translation work, you’d need to be continually reading from individual neurons and tracking how their tuning functions change over time.
Revisiting assumptions
It’s possible that the assumptions we started this thought experiment with are incorrect — namely that we should be stimulating face patch neurons to change face perception. Perhaps there is a region that the IT cortex feeds into where significantly fewer neurons can be stimulated to drive specific visual perceptual changes.
Inferring the average face
I also wondered if you could figure out a given macaque’s average “face” even if you didn’t know what faces it had been previously exposed to. I don’t think calling this its “prior” is technically accurate, but analogizing the origin of a face space contributing to my curiousity about this question.
If you’re lacking an average face (or the training data from which you’d calculate the average), it could be roughly determined using the same tools Chang and Tsao employed to predict faces. This could be a case that may more readily apply to humans, or less controlled settings. If two humans participate in your study, you wouldn’t know the values of their respective “origins” in face space a priori.
Let’s assume almost the same experimental setup as Chang and Tsao except that macaque B has participated in previous face patch studies using different datasets.
The experimenter doesn’t know what faces macaque B has been previously exposed to, and it’s possible they could be largely different from this study’s dataset. For example, perhaps macaque B had previously been exposed to faces that were 90% one gender.
So while we’d expect neurons in both macaque’s face patches to have ramp-shaped tuning, the neurons’ firing rates may be relative to different averages. For example, if 90% of the faces macaque A had been exposed to were feminine, then masculine facial features may elicit higher frequency spiking than if macaque B had been exposed to an even split of masculine and feminine faces.
This is because features are measured relative to the average, so macaque A viewing masculine facial features would be a large deviation from its average face.
A friend speculated that the face patch, and other regions with similar coding principles, likely adapts its principal components to maximize discriminatory power — in other words, the brain is trying to figure out what facial feature variations help it best discern the differences between faces.
To find the average face for a macaque, you can assume an average, calculate the error between the actual face and the predicted face, then iteratively test new averages that reduce this error.
Here’s how this would work in a scenario where you starting out knowing macaque A’s average face but lack macaque B’s:
Revisiting the equation from Chang and Tsao’s face decoding paper:
R = S * F + C where R is the vector of responses of each neuron, S is a 205x50 matrix of weighting coefficients given by the slopes of the ramped shaped tuning functions, F is the facial feature values, and C is an offset vector.
Record macaque B’s neural responses R to a dataset of faces
Parameterize a face space with macaque A’s average face at the origin.
Calculate S.
Use R = S(F) + C to predict F, the features of a given space based on macaque B’s values for R
Measure the error between F that the equation predicts and actual face — calculating the distance between each predicted feature value and the actual value.
Calculate what shift to the origin would minimize this error.
Update the origin to the value that minimizes error— this is the inferred average face the macaque has seen.
This method should work for the Tsao Lab’s later work on object spaces as well.
Thanks to Janis Hesse, Hunter Ozawa Davis, Raffi Hotter, and Quintin Frerichs for helpful conversations and feedback.
A topological solution to object segmentation and tracking (Tsao and Tsao, 2022) feels like an interesting intuition for this. ↩︎
I also should eat my vegetables and better understand why enactivists would think this whole line of inquiry is foolish (to do = writing a critique of this from the pov of a sensorimotor theory of vision and visual consciousness ?) ↩︎
Face patch size may vary / and modeling them as a sphere may not be totally accurate, but I just wanted to generate a rough estimate. ↩︎
From Alipasha Vaziri’s website where he writes about the findings:
“Widespread application of dimensionality reduction to multi-neuron recordings implies that neural dynamics can be approximated by low-dimensional “latent” signals reflecting neural computations. However, what would be the biological utility of such a redundant and metabolically costly encoding scheme and what is the appropriate resolution and scale of neural recording to understand brain function?
Imaging the activity of one million neurons at cellular resolution and near-simultaneously across mouse cortex, we demonstrate an unbounded scaling of dimensionality with neuron number. While half of the neural variance lies within sixteen behavior-related dimensions, we find this unbounded scaling of dimensionality to correspond to an ever-increasing number of internal variables without immediate behavioral correlates. The activity patterns underlying these higher dimensions are fine-grained and cortex-wide, highlighting that large-scale recording is required to uncover the full neural substrates of internal and potentially cognitive processes.” ↩︎