Translating visual experience between brains

I wrote a very simple thought experiment and related questions about recreating the visual experience between one macaque in another inspired by face decoding experiments conducted in the Tsao Lab. I wrote an explanation of the Tsao Lab’s work on object representation here — probably necessary reading to understand what follows.


Outline 


Intro and motivation

I was motivated to write this because I’m interested in methods to compare representations/experiences between biological brains and the potential limitations of these approaches.

The visual system is well studied, and a perk of picking the visual system for this thought experiment is that there’s a pretty workable reference point for reasoning — we can compare multiple self-reports to an external representation (an image), and use this external representation to aid in decoding neural activity. Even though there are individual differences in visual perception, we all develop visual experience through interaction with environments governed by the same physical laws.1 In contrast, an experience evoked by a concept like Burma or Chanel, may not be as tractable to decode and compare between subjects in a way that tells us something about the “nature” of that representation.

More so than the example I give below, I’m interested in what an explanation of the limits of comparison could look like, at least with the approach I use — the approach being that you learn what neurons code for your percept of interest and how in subject a and then find the correspondence in subject b and perform some transformation to recreate that percept. For example, I’m skeptical that you could recreate subject a’s experience in b if b has no context for a’s experience. 2


Building a translator
Chang and Tsao 2017 decode faces a macaque is viewing. This implies we could (in principle) instantiate arbitrary faces in the macaque’s visual field through stimulating face patch neurons, given sufficiently advanced neurotech. 

If we have the information we need to decode faces from neural activity, can we then use the neural activity encoding a face in one macaque and instantiate the same face in another through neurostimulation?

Assumptions:

The experiment will involve 2 macaques (A and B), the presentation of face images, and neural recording and stimulation technology.

The easiest, and presumably impossible, case would be if macaques A and B had identical neuroanatomy and prior training data (they had been exposed to all the same faces with the same frequency). Then they may have functionally interchangeable STA axes and origins in their face spaces, meaning A and B’s neurons respond the same to faces.

In this case, you’d just need to stimulate the corresponding neurons the same way in macaque B that you recorded from macaque A after presenting a face to it. This would also be functionally the same as just playing back A’s activity to itself. 

For this thought experiment, we will treat the principal components as conserved since they were in previous face patch work from the Tsao lab and assume we have access to previous neural data from macaque B such that we know to which features each neuron responds. We’ll also assume that macaques had the same training data since they did in Chang and Tsao 2017.

But assuming A and B have different neuroanatomy, which I would, building a translator would require a few extra steps:

To return to the image from the beginning, the “translator” part would be doing steps 1-3, and the methods to do so come from Chang and Tsao. 


How do you know if it worked? 

Let’s say you built your translator and are pretty confident that you can translate face responses between macaques. How do you know if it actually worked? Fwiw I’m pretty interested in the general form of this question— how do we study perception in cases without self report?

I’m not sure you’d ever actually know if it worked. Even if you translate the face image to yourself and check on your own, you’re just inverting the translation operation, so you wouldn’t know if it’s correct.

Redundancy in face patch neurons
Chang and Tsao record ~200 adjacent neurons across 3 patches to decode faces, yet this is a tiny fraction of total face patch neurons. To give a rough estimate of total face patch neurons: The ML face patch, one of the regions they recorded from, is estimated to be about 4mm in diameter so 33.5 mm^3. The macaque neocortex averages about 160,000 neurons/mm^3. Chang and Tsao were recording from 3 face patches (out of 6 total in macaques) so 200 neurons from a region of interest with about 16,080,000 neurons (160,000 x 33.5 x 3) is about 0.00001% of face patch neurons!3

Is this tractable?
It seems unlikely that stimulating 200 neurons in a sea of millions involved in the representation of facial features would be casually relevant enough to dramatically change perception. Is it reasonable to think you’d need to stimulate at least half of the relevant neurons that code for facial features? — potentially stimulating over 15 million neurons (based on estimates of total face patch neurons) that you’d have to write very specific firing rates to?

In stark contrast to my intuition, I’ve talked to a couple of neuroscience PIs who suspect because of dense recurrence in these areas you may only need to stimulate <1000 neurons or even <100 to drive meaningful perceptual shifts. Spectacularly, it seems like neuroscientists have been able to alter visual perception in mice by stimulating as few as 2 neurons! These results were for generating percepts corresponding to horizontal or vertical gratings, and it’s unclear how vivid or stable these percepts were. A potential counterpoint— Manley et al. 2024 record from nearly 1M neurons at cellular resolution in the mouse dorsal cortex, using light beads microscopy, and find an unbounded scaling of dimensionality.4

That said, as a working BCI, it may be difficult to place your stimulation tool exactly where these casually significant neurons are — assuming that for different percepts they’re more widely distributed than the reach of your BCI. I’d assume to recreate a face perception, for example, you’d need an optical BCI that can get single neuron read/write which would have a pretty limited reach (500-1000 microns if you’re using 2-photon).5

Representational drift
If we stick to the assumptions of this experiment, eventual representational drift would also make translation incredibly difficult. Even if you understand every neuron’s tuning function and could write to every relevant neuron to instantiate some percept, I wouldn’t expect those tuning functions to be stable.

What individual neurons respond to changes over time, known as representational drift. Memory engrams move around and neurons that code for components of sensory percepts (like facial features in the IT cortex) change their tuning functions. To make single neuron stimulation for translation work, you’d need to be continually reading from individual neurons and tracking how their tuning functions change over time.

Revisiting assumptions
It’s possible that the assumptions we started this thought experiment with are incorrect — namely that we should be stimulating face patch neurons to change face perception. Perhaps there is a region that the IT cortex feeds into where significantly fewer neurons can be stimulated to drive specific visual perceptual changes.

Thanks to Janis Hesse, Hunter Ozawa Davis, Raffi Hotter, and Quintin Frerichs for helpful conversations and feedback.