elephant
elephant copied to clipboard
CCH normalization / y-offset
Hey!
@morales-gregorio and I stumbled across the following issues when using cross_correlation_histogram
to calculate the time-lagged auto-correlation function. These might not be actual issues but boil down to different conventions with their respective (dis-)advantages. Just want to raise it here to make you aware.
If I understand the cross_correlation_histogram
function correctly, it calculates
CCH_{s1,s2}[k] = sum_n s1[n+k] * s2[n]
where s1 and s2 are BinnedSpikeTrain.to_array()
s. All the following points are based on this.
Normalization (time)
For me, a (binned) spiketrain has dimension 1/time such that it's integral equals the total number of spikes. Hence, I expect the CCH to have the dimension 1/time^2. However, in agreement with the above definition, it is dimensionless. This amounts to a factor 1/binsize**2
difference.
Normalization (convolution)
The sum_n
in the CCH depends on the length of the binned spiketrains. Thus, the CCH of a realization of a stationary process that is twice as long as another realization leads to a CCH that differs by a factor of two. To avoid this, I think the CCH should be divided by the length of the binned spiketrains. This amounts to a factor binsize/T
difference.
y-offset For long times, the time-lagged correlation should (for reasonable processes) decay to zero. However, in the above definition it decays to the squared rate because s1 and s2 are not centered. This amounts to a subtraction of the product of the mean values of s1 and s2.
Taking all three points into account, I arrive at
CCH_new = CCH_old / binsize / T - rate**2
for the auto-correlation (for the cross-correlation the rate might of course be different).
Just to be clear: none of the above amounts to the question whether the variance should be divided out imho.
It took a long time to think about what you wrote and what we actually do with it in CCH.
As you saw, CCH function has cross_corr_coef
flag. However, it does slightly different than your proposal. But the equation for computing the cross-correlogram for poisson spike trains seems to be established since 1982 and used in papers interchangeably, so we stick to the equation, which is given in our book which in turn refers to Eggermont 1992a.
But if you look closely at the equation and our implementation of cross_corr_coef()
, you find differences. That makes me wonder if the implementation is correct at all. It has to be correct though because we have a test to compare CCH + cross_corr_coef output with corrcoef()
function.
After you switched to cross_corr_coef=True
in spike_train_timescale()
, it started to work as you expect?
Thanks for looking into this! For the timescales, cross_corr_coef=True
works fine. There, I only need the y-offset corrected, which is done in cross_corr_coef=True
, and I don't care about the normalizations because I normalize the CCH to one (https://github.com/NeuralEnsemble/elephant/blob/master/elephant/spike_train_correlation.py#L816).
I thought about the normalization a bit more. Based on Gruen & Rotter (2010) chapter 5, it is imho consistent with Eq. (5.2) and (5.4) to neglect both the y-offset and the normalization (time). Both are just conventions I think and one simply has to be consistent (and put a reference). However, I think the normalization of the convolution is necessary. Eq. (5.4) reads
CCH_{s1,s2}[k] = \frac{1}{N} sum_{n=1}^N s1[n] * s2[n+k]
while the code calculates
CCH_{s1,s2}[k] = sum_{n=1}^N s1[n] * s2[n+k]
as far as I can see.
Hm, there are different conventions on how to normalize CCH, see https://de.mathworks.com/help/matlab/ref/xcorr.html
If cross_corr_coef
is False, the current implementation follows scaleopt='none'
strategy even though, according to the book, we should have used scaleopt='bias'
strategy. This is strange, indeed. We even have a test that checks our CCH result with a bare np.correlate()
without any normalization.
I've opened PR https://github.com/NeuralEnsemble/elephant/pull/277 that adds this flag to another similar function of ours - cross_correlation_function()
. Do you think it makes sense to add scaleopt
flag here as well?
Yes, I think adding a scaleopt
flag sounds like a good solution to this. I'd argue that scaleopt='bias'
should be the default here imho.
This was fixed by #277 , if not, feel free to reopen this issue.