elephant icon indicating copy to clipboard operation
elephant copied to clipboard

CCH normalization / y-offset

Open AlexVanMeegen opened this issue 5 years ago • 4 comments

Hey!

@morales-gregorio and I stumbled across the following issues when using cross_correlation_histogram to calculate the time-lagged auto-correlation function. These might not be actual issues but boil down to different conventions with their respective (dis-)advantages. Just want to raise it here to make you aware.

If I understand the cross_correlation_histogram function correctly, it calculates

CCH_{s1,s2}[k] = sum_n s1[n+k] * s2[n]

where s1 and s2 are BinnedSpikeTrain.to_array()s. All the following points are based on this.

Normalization (time) For me, a (binned) spiketrain has dimension 1/time such that it's integral equals the total number of spikes. Hence, I expect the CCH to have the dimension 1/time^2. However, in agreement with the above definition, it is dimensionless. This amounts to a factor 1/binsize**2 difference.

Normalization (convolution) The sum_n in the CCH depends on the length of the binned spiketrains. Thus, the CCH of a realization of a stationary process that is twice as long as another realization leads to a CCH that differs by a factor of two. To avoid this, I think the CCH should be divided by the length of the binned spiketrains. This amounts to a factor binsize/T difference.

y-offset For long times, the time-lagged correlation should (for reasonable processes) decay to zero. However, in the above definition it decays to the squared rate because s1 and s2 are not centered. This amounts to a subtraction of the product of the mean values of s1 and s2.

Taking all three points into account, I arrive at

CCH_new = CCH_old / binsize / T - rate**2

for the auto-correlation (for the cross-correlation the rate might of course be different).

Just to be clear: none of the above amounts to the question whether the variance should be divided out imho.

AlexVanMeegen avatar Oct 30 '19 18:10 AlexVanMeegen

It took a long time to think about what you wrote and what we actually do with it in CCH.

As you saw, CCH function has cross_corr_coef flag. However, it does slightly different than your proposal. But the equation for computing the cross-correlogram for poisson spike trains seems to be established since 1982 and used in papers interchangeably, so we stick to the equation, which is given in our book which in turn refers to Eggermont 1992a.

But if you look closely at the equation and our implementation of cross_corr_coef(), you find differences. That makes me wonder if the implementation is correct at all. It has to be correct though because we have a test to compare CCH + cross_corr_coef output with corrcoef() function.

After you switched to cross_corr_coef=True in spike_train_timescale(), it started to work as you expect?

dizcza avatar Nov 15 '19 10:11 dizcza

Thanks for looking into this! For the timescales, cross_corr_coef=True works fine. There, I only need the y-offset corrected, which is done in cross_corr_coef=True, and I don't care about the normalizations because I normalize the CCH to one (https://github.com/NeuralEnsemble/elephant/blob/master/elephant/spike_train_correlation.py#L816).

I thought about the normalization a bit more. Based on Gruen & Rotter (2010) chapter 5, it is imho consistent with Eq. (5.2) and (5.4) to neglect both the y-offset and the normalization (time). Both are just conventions I think and one simply has to be consistent (and put a reference). However, I think the normalization of the convolution is necessary. Eq. (5.4) reads

CCH_{s1,s2}[k] = \frac{1}{N} sum_{n=1}^N s1[n] * s2[n+k]

while the code calculates

CCH_{s1,s2}[k] = sum_{n=1}^N s1[n] * s2[n+k]

as far as I can see.

AlexVanMeegen avatar Nov 15 '19 14:11 AlexVanMeegen

Hm, there are different conventions on how to normalize CCH, see https://de.mathworks.com/help/matlab/ref/xcorr.html

If cross_corr_coef is False, the current implementation follows scaleopt='none' strategy even though, according to the book, we should have used scaleopt='bias' strategy. This is strange, indeed. We even have a test that checks our CCH result with a bare np.correlate() without any normalization.

I've opened PR https://github.com/NeuralEnsemble/elephant/pull/277 that adds this flag to another similar function of ours - cross_correlation_function(). Do you think it makes sense to add scaleopt flag here as well?

dizcza avatar Dec 06 '19 16:12 dizcza

Yes, I think adding a scaleopt flag sounds like a good solution to this. I'd argue that scaleopt='bias' should be the default here imho.

AlexVanMeegen avatar Dec 13 '19 13:12 AlexVanMeegen

This was fixed by #277 , if not, feel free to reopen this issue.

Moritz-Alexander-Kern avatar Nov 28 '22 07:11 Moritz-Alexander-Kern