This work is about recording from tetrodes, one of a class of
multielectrode devices with overlapping "detection volumes", that is, each
recorded neuron has a detectable signal on more than one of the electrodes. Such recording
devices are invaluable for many reasons:
I. The Detection Problem
A serious problem faced by experimenters is that of sorting spikes. In the
multielectrode case in particular, one must be able to identify many cells simultaneously.
However, the issue of spike detection is also a very important one and is inseparable
from the spike-sorting problem, although it has received less attention.
Finding spikes:
- There is always some sort of threshold, e.g. raw amplitudes, matched filters, energy,
etc. To minimize the number of missed spikes one tries to set the threshold as low as
possible.
- But, not all detected spikes are clusterable. Some of these are low amplitude noise
events (garbage). At low thresholds these noise events dominate. Their number rises
supralinearly with decreasing threshold level.
Criteria of performance:
- In the context of multielectrode recordings one measure is the number of clusterable
(``useful'') spikes relative to the total number detected.
Features of the Data
- High density of events near the origin.
- Elongated clusters.
Why elongated? Two possibilities are:
- Intrinsic amplitude variability (e.g. bursting cells).
- Common cross-channel noise.
These possibilities have different implications:
Our observations from recordings in cat visual cortex and LGN are closer to case B).
We therefore predict a high degree of cross-channel correlation. We observe between
electrode correlation coefficients in the range of 0.7-0.9.
This common source noise is truly neurobiological in origin. We have checked that it
cannot be due to, e.g. stray capacitances between electrodes. We have also verified that
this cross-channel correlation is not due to some source such as a floating reference
signal. Recordings from two tetrodes simultaneously in different regions of the cortex
show large inter-channel correlations on each tetrode but smaller cross-tetrode
correlations.
Such common noise makes biological sense: it is presumably due to the
spikes of thousands of nearby neurons outside the electrodes' detection volumes. This
source is largely common to the four electrodes.
Optimizing Detection
The thresholding surface can be determined from the cross-channel covariance matrix.
This also accounts for different channel sensitivities.
- Estimate the cross channel covariance matrix, C, from random data samples
or on-line.
- Taking the channel voltages Vi as a 4-vector, the thresholding
surface is:

Numerical Comparison of Thresholding Procedures
Below we show the results of the comparison of methods for a typical sample of data
measured in cat LGN in our lab.
Both graphs show the percentage of missing and clusterable spikes for the two
thresholding methods.
By percent clusterable, we refer to the number of spikes that can be put into separated
clusters compared to the total number of spikes at a fixed threshold level. Missing spikes
at a given threshold level are defined to be the lost spikes in a cluster that were
present in that cluster at the lowest possible threshold setting.
Different methods of comparison:
- Compare at fixed threshold factor f. This directly compares different models of
the noise distribution. Obviously the hypercubical model is very poor, yet this is what is
used with the standard thresholding procedure. From the first graph below we see that at 4
sigma one is missing less than 10% of spikes in the elliptical case but 40% in the
rectangular case!
- Compare the ratio of clusterable to unclusterable events for both methods for a fixed
total number of spikes. In order to avoid missing more than 5% of spikes, one must analyze
50 - 100% more spikes in the rectangular case and the percentage of clusterable spikes
drops from 70 to 40%.

Visual Comparison of Thresholding Procedures
The figures below compare visually the performance of the hyperellipsoidal and
hypercubical thresholding methods, using an automatic clustering procedure. For each
method, the raw amplitude projections as well as the Hadamard rotations of these
projections are shown.
The latter is an orthogonal transformation, H into sum and difference
coordinates:
This, to a large degree, rotates away the cross channel correlation in some projections,
yielding more circular clusters. Similarly one could rotate to the eigenbasis of the
covariance matrix.
Benefits for Automatic Clustering
These figures also illustrate the effect of clipping on the inferred probability
densities. Note that in the hypercubical case, the garbage cluster based on the
cross-channel covariance matrix is ineffective.
Problems with the typical thresholding methods:
- If clusters are clipped then they are non-Gaussian and difficult to model.
- By lowering a hypercubical threshold low enough to avoid significant clipping, one
quickly becomes overwhelmed by the sheer number of spikes. Handling this amount of data is
computationally infeasible and/or inefficient and the large number of garbage events can
"break" some clustering algorithms.
Clearly the ellipsoidal thresholding method helps alleviate all of these problems.
II. Automatic Spike Classification
An important question is: in what space to classify?
Two classes of possibilities:
- Full Waveforms
These contain all the information about the spike train but not all of it is useful.
For sorting one needs to reduce the high dimensionality of the waveform space by some
method such as principal components analysis (PCA) or to simplify the probability model
representing the data, e.g. by making assumptions about the noise.
- Features of spikes
These encode physically relevant aspects of the spikes as well as those which give good
discrimination between cells. Some of these features may be nonlinear in the waveform
voltage trace.
We will pursue the latter approach here.
Types of Features:
Features we have looked at include: channel amplitudes: A-, A+;
peak-to-peak width, T; full width at half A-: W; slopes (A-/W) and
interchannel delays.
We have good evidence from both manual and automatic clustering methods that A-
gives good separation of spikes and that using only this feature gives a good "first
order" solution to the sorting problem. The data shown in Section I were clustered
using only this feature in a mixture of Gaussians model.
However, other aspects of the waveform may be relevant. Here we study
the additional inclusion of both A+ and T in a probability model. Based on
preliminary investigations, the other features gave either redundant information or
provided less discrimination.
Automatic Sorting Method
- Feature detection
Determining features reliably is a nontrivial problem and can be sensitive to the
quality of the data set such as proper filtering.
- Outlier detection
Clustering results based on a training set of data will be more robust if outliers in data
are removed from this set. Such outliers include overlapping waveforms and waveforms with
inconsistencies in their extracted features. These outliers are returned to the data set
after inferring the best probability model for further processing such as overlap
decomposition.
- Probability model
The simplest way to try to cluster data in larger dimensional feature spaces is to use a
general mixture of Gaussians, with unconstrained covariance matrices, i.e., we allow the
clusters in features to have different shapes for different cells.
The success of such a model in describing the data depends on the choice of features. If
the set of features are not approximately Gaussian distributed, the model will not be a
good fit.
- Implementation
We model the data (after removing a selected class of outliers)
using a general mixture of Gaussians in the space of features.
- Two fixed garbage clusters are added: one uniform, the other with zero mean and
covariance determined in part by the cross-channel covariance matrix up to a scale factor.
- Model parameters are determined using the Expectation-Maximization (EM) algorithm to
maximize a penalized likelihood.
- A prior covariance for the clusters is given favoring clusters that are not overly
broad. This prior weighting decays with successive iterations of the EM algorithm, as
clusters are allowed to find their most likely size and shape.
- Models of different numbers of clusters (cells) are compared via the Bayesian
Information Criterion (BIC), a complexity penalty equivalent to a broad prior on the
number of clusters
Sample results of our feature space clustering are shown below for data obtained in cat
visual cortex.
Discussion and Results
- Overall the automatic method does a good job in picking up all of the cells in the data.
- The mean waveforms provide templates for the individual cell's waveform shape which can
be used for further classification and overlap decomposition.
- As indicated on the mean waveform figures, we see that some cells are represented by
several clusters. In some cases there is some intrinsic variability of the spikes
(presumably non-Gaussian) which cannot easily be modeled by a single Gaussian
cluster. In the case of the cyan and dark green clusters we have confirmed that the
variability is not due to bursting, while for the magenta and blue clusters the multiple
clusters clearly represent bursting behavior of the cell.
- While extra features sometimes do not produce better final results than the amplitudes
alone (compare to Section I results), they do provide extra information that can speed the
convergence of the clustering algorithm.
- So far we have only discussed the question of statistical inference in determining the
classification problem. A second stage is to apply some decision theoretic rules for
cluster merging and outlier cluster removal.
Recent and Future Directions
- Bayesian inference methods instead of maximum likelihood. These can better incorporate
prior information about the clusters and enable better model comparison for determining
the number of cells.
- Some of the intrinsic variability comes from bursting behavior. We have implemented a
simple iterative scheme to remove burst spikes from the training data set for a
better Gaussian fit. We are also investigating adding inter-spike interval information to
the probability model to better identify burst spikes.
- An obvious way to combine some of the clusters is to use a hierachical mixture of
Gaussians to infer the model, where the means of the extra Gaussians are constrained to be
equal or, in a Bayesian framework, biased to be nearly equal.
- We have also implemented methods to cluster in the full waveform space using a local
probabilistic form of principal components analysis. Previous results obtained were
similar to the results obtained using only the A- amplitude. We would like to
apply our outlier removal methods to this analysis to see how much the performance
improves.
- Improved feature detection by fitting waveforms using Gaussian process interpolation
methods.
- Treat overlaps in the data.
Conclusion
- Tetrode channels are highly correlated from neurobiological sources.
- Using a thresholding surface that fits the empirical noise distribution has great
benefits:
- Significantly less loss of "useful" events.
- Better cluster distinguishability.
- Improvements in automatic clustering performance.
- Higher quality information for studying the nature of local cortical circuitry.
- Automatic clustering in the space of spike features is a practical method for sorting
spikes when coupled with our thresholding technique, good feature determination and good
outlier detection.
References:
1.Gray C., Maldonado P., Wilson M. and McNaughton B., Tetrodes markedly improve the
reliability and yield of multiple single-unit isolation from multi-unit recordings in cat
striate cortex, J. Neur. Meth., 1995, 63(1-2), p.43-54.
2.Rebrik S., Tzonev S. and Miller K. D., Analysis of Tetrode Recordings in Cat Visual
System, in: Proceedings of CNS97 (Computation and Neural Systems Meeting, Big Sky Montana,
July 1997), J.M Bower, Ed., Plenum Press, 1998.
Acknowledgements:
Work supported by grant R01-NS33787 from the NINDS, by the Searles Scholars' Program,
and by the Alfred P. Sloan Foundation.