Recently, I’ve been working on a project to apply persistent homology to neural spike train data, with the particular goal of seeing whether this technique can reveal “low frequency” relationships and co-firing patterns that standard dimensionality reduction methods like PCA have been throwing away. For example, some neurons fire at a very Hz, around ~75-100 Hz, while in fact most neurons fire at ~10-30 Hz in response to some stimulus. The loud, shouty neurons are drowning out the quiet, contemplative ones! What do the quiet neurons know? More to the point, how do I get it out of them?
Persistent homology has been applied to neural data before, particularly by Singh et al. to model vision and by Dabaghian et al. to model the behavior of place cells. Place cells are phenomenal, highly-structured little cells in your hippocampus that fire whenever you find yourself in a particular, local “place”. If you’re in a room, there might a place cell for the corner of the room, a place cell for the center, and another one for the area right in front of the door! The problem with Dabaghian et. al.’s analysis, however, is that they are using persistent homology not (only) as a data analytics tool but as part of a biological hypothesis about how the place cells work. That is, they conjecture that the neurons downstream of the place cells interpret the co-firing of place cells as natively topological information—information about “connectedness, adjacency, and containment”—rather than exact geometric information about distances. They don’t necessarily suggest that our brain is running persistent homology, but they point out that the co-firing of juxtaposed place cells certainly suggests something like that is very possible. (This shouldn’t be a big surprise for us. After all, homology itself comes down to a very basic and universal phenomenon: the need to piece together many small pieces of information in order to get a global picture of the environment.)
A couple of problems I’ve encountered in applying this to the spike train data I have.
- The task I’m facing is straight data analysis—the data comes from neurons in mice reacting to different odors—and there are no obvious “geometric hooks” like place cells that encourage a topological perspective. In other words, there’s no obvious topology in the odorant space (as opposed to the the topology of a room with a hole in the middle) that lets me verify my findings.
- In their experiment, Dabaghian et al. are generating the “temporal” simplicial complex directly from the time-binned data, not forming (as is usual in persistent homology) a Rips complex from a cover over points in a metric space.
- Okay, so I want points in a metric space. But this is a time-series… and because there are so many neurons firing (or not firing) at each point in time (really a very small interval of time), each point is almost maximally distant from each other in Hamming distance!
What do I do? There’s one normal approach to time series: smooth (e.g. interpolate) then decompose my smoothed data into a Fourier basis, then compare the spike train data for a particular neuron in this metric space. Intuitively I find this dubious (how is this better than PCA if I’m throwing out a lot of information when I smooth?). And it also misses the fact that what I really want is to study the co-firing information.
I thought another approach might be to use the “persistent vineyard” approach discussed in Applying Persistent Homology to Time Varying Systems. In her thesis, Munch describes methods for combining many “slices” of the persistent homology of a point cloud as it varies (continuously). But the point cloud I have at each “slice” is very sparse—it’s sitting in some finite product space!
Okay, I’m not throwing up my hands yet. These papers look promising. But to make progress we really have to dig back into persistent homology and see what this technique is really doing, and see what the “natural” modification for time-series might be.