Elementary telepathy 

I walk through a simple thought experiment and related questions about recreating the visual experience between one macaque in another inspired by face decoding experiments conducted in the Tsao Lab. I add an additional thought experiment at the end about how to use this method to infer the average face someone has seen. I wrote an explanation of the Tsao Lab’s work on object representation here — probably necessary reading to understand what follows.


Intro and motivation

I’m interested in methods to compare representations/experiences between biological brains and the potential limitations of these approaches.The visual system is well studied, and there’s a pretty workable reference point for reasoning — we can compare multiple self-reports to an external representation (an image), and use this external representation to aid in decoding neural activity. Even though there are individual differences in visual perception, we all develop visual experience through interaction with environments governed by the same physical laws.1 


More so than the example below, I’m interested in what an explanation of the limits of comparison could look like, at least with the approach I use — the approach being that you learn what neurons code for your percept of interest and how in subject a and then find the correspondence in subject b and perform some transformation to recreate that percept. For example, I’m skeptical that you could recreate subject a’s experience in b if b has no context for a’s experience. 2


Building a translator
Chang and Tsao 2017 decode faces a macaque is viewing. This implies we could (in principle) instantiate arbitrary faces in the macaque’s visual field through stimulating face patch neurons, given sufficiently advanced neurotech. 

If we have the information we need to decode faces from neural activity, can we then use the neural activity encoding a face in one macaque and instantiate the same face in another through neurostimulation?

Assumptions:

The experiment will involve 2 macaques (A and B), the presentation of face images, and neural recording and stimulation technology.

The easiest, and presumably impossible, case would be if macaques A and B had identical neuroanatomy and prior training data (they had been exposed to all the same faces with the same frequency). Then they may have functionally interchangeable STA axes and origins in their face spaces, meaning A and B’s neurons respond the same to faces.

In this case, you’d just need to stimulate the corresponding neurons the same way in macaque B that you recorded from macaque A after presenting a face to it. This would also be functionally the same as just playing back A’s activity to itself. 

For this thought experiment, we will treat the principal components as conserved since they were in previous face patch work from the Tsao lab and assume we have access to previous neural data from macaque B such that we know to which features each neuron responds. We’ll also assume that macaques had the same training data since they did in Chang and Tsao 2017.

But assuming A and B have different neuroanatomy, which I would, building a translator would require a few extra steps:

To return to the image from the beginning, the “translator” part would be doing steps 1-3, and the methods to do so come from Chang and Tsao. 


How do you know if it worked? 

Let’s say you built your translator and are pretty confident that you can translate face responses between macaques. How do you know if it actually worked? Fwiw I’m pretty interested in the general form of this question— how do we study perception in cases without self report?

I’m not sure you’d ever actually know if it worked. Even if you translate the face image to yourself and check on your own, you’re just inverting the translation operation, so you wouldn’t know if it’s correct.

Redundancy in face patch neurons
Chang and Tsao record ~200 adjacent neurons across 3 patches to decode faces, yet this is a tiny fraction of total face patch neurons. To give a rough estimate of total face patch neurons: The ML face patch, one of the regions they recorded from, is estimated to be about 4mm in diameter so 33.5 mm^3. The macaque neocortex averages about 160,000 neurons/mm^3. Chang and Tsao were recording from 3 face patches (out of 6 total in macaques) so 200 neurons from a region of interest with about 16,080,000 neurons (160,000 x 33.5 x 3) is about 0.00001% of face patch neurons!3

Is this tractable?
It seems unlikely that stimulating 200 neurons in a sea of millions involved in the representation of facial features would be casually relevant enough to dramatically change perception. Is it reasonable to think you’d need to stimulate at least half of the relevant neurons that code for facial features? — potentially stimulating over 15 million neurons (based on estimates of total face patch neurons) that you’d have to write very specific firing rates to?

In stark contrast to my intuition, I’ve talked to a couple of neuroscience PIs who suspect because of dense recurrence in these areas you may only need to stimulate <1000 neurons or even <100 to drive meaningful perceptual shifts. Spectacularly, it seems like neuroscientists have been able to alter visual perception in mice by stimulating as few as 2 neurons! These results were for generating percepts corresponding to horizontal or vertical gratings, and it’s unclear how vivid or stable these percepts were. A potential counterpoint— Manley et al. 2024 record from nearly 1M neurons at cellular resolution in the mouse dorsal cortex, using light beads microscopy, and find an unbounded scaling of dimensionality.4

That said, as a working BCI, it may be difficult to place your stimulation tool exactly where these casually significant neurons are — assuming that for different percepts they’re more widely distributed than the reach of your BCI. I’d assume to recreate a face perception, for example, you’d need an optical BCI that can get single neuron read/write to specific firing frequencies which would have a pretty limited depth (500-1000 microns if you’re using 2-photon).

Representational drift
If we stick to the assumptions of this experiment, eventual representational drift would also make translation incredibly difficult. Even if you understand every neuron’s tuning function and could write to every relevant neuron to instantiate some percept, I wouldn’t expect those tuning functions to be stable.

What individual neurons respond to changes over time, known as representational drift. Memory engrams move around and neurons that code for components of sensory percepts (like facial features in the IT cortex) change their tuning functions. To make single neuron stimulation for translation work, you’d need to be continually reading from individual neurons and tracking how their tuning functions change over time.

Revisiting assumptions
It’s possible that the assumptions we started this thought experiment with are incorrect — namely that we should be stimulating face patch neurons to change face perception. Perhaps there is a region that the IT cortex feeds into where significantly fewer neurons can be stimulated to drive specific visual perceptual changes.


Inferring the average face 

I also wondered if you could figure out a given macaque’s average “face” even if you didn’t know what faces it had been previously exposed to. I don’t think calling this its “prior” is technically accurate, but analogizing the origin of a face space contributing to my curiousity about this question.  


If you’re lacking an average face (or the training data from which you’d calculate the average), it could be roughly determined using the same tools Chang and Tsao employed to predict faces. This could be a case that may more readily apply to humans, or less controlled settings. If two humans participate in your study, you wouldn’t know the values of their respective “origins” in face space a priori. 


Let’s assume almost the same experimental setup as Chang and Tsao except that macaque B has participated in previous face patch studies using different datasets.

The experimenter doesn’t know what faces macaque B has been previously exposed to, and it’s possible they could be largely different from this study’s dataset. For example, perhaps macaque B had previously been exposed to faces that were 90% one gender. 

So while we’d expect neurons in both macaque’s face patches to have ramp-shaped tuning, the neurons’ firing rates may be relative to different averages. For example, if 90% of the faces macaque A had been exposed to were feminine, then masculine facial features may elicit higher frequency spiking than if macaque B had been exposed to an even split of masculine and feminine faces.

 This is because features are measured relative to the average, so macaque A viewing masculine facial features would be a large deviation from its average face.

A friend speculated that the face patch, and other regions with similar coding principles, likely adapts its principal components to maximize discriminatory power — in other words, the brain is trying to figure out what facial feature variations help it best discern the differences between faces.

To find the average face for a macaque, you can assume an average, calculate the error between the actual face and the predicted face, then iteratively test new averages that reduce this error.

Here’s how this would work in a scenario where you starting out knowing macaque A’s average face but lack macaque B’s:

Revisiting the equation from Chang and Tsao’s face decoding paper:
R = S * F + C where R is the vector of responses of each neuron, S is a 205x50 matrix of weighting coefficients given by the slopes of the ramped shaped tuning functions, F is the facial feature values, and C is an offset vector. 



This method should work for the Tsao Lab’s later work on object spaces as well.



Thanks to Janis Hesse, Hunter Ozawa Davis, Raffi Hotter, and Quintin Frerichs for helpful conversations and feedback.