elephant
elephant copied to clipboard
[Bug] optimal_kernel_bandwidth yields extremely large kernel widths
Describe the bug
The estimation of the instantaneous_rate with kernel="auto"
of a very long spiketrain (~650 s) yielded a very smooth and flat rate. It was tracked down to the estimation of the optimal_kernel_bandwidth (the width of the Gaussian kernel, fixed for the whole duration), which yielded a bandwidth of ~22 s for spiketrain with a rather low firing rate.
After some investigation, it was noticed that the optimal bandwidth strongly depends on the duration of the spiketrain. One reason for this might be the hard-coded number of bins (1000) here:
https://github.com/NeuralEnsemble/elephant/blob/67dd3f3ef86161c0e70faf6e50690bdfd591978c/elephant/statistics.py#L1576-L1597
Although, the option to give an array of times
on which the optimization of the kernel is performed exists, it is not very accessible. (See original publication Shimazaki, H. & Shinomoto, S. Kernel bandwidth optimization in spike rate estimation. J Comput Neurosci 29, 171–182 (2010). for details)
Further notice: The implemented optimal_kernel_bandwidth estimates one fixed kernel bandwidth for the given spiketrain and is not adaptive over time! The originating code https://github.com/cooperlab/AdaptiveKDE also implements the adaptive case and could be included into elephant.
To Reproduce
- Get spiketimes
- Call
elephant.statistics.optimal_kernel_bandwidth(spiketimes)
A minimal example is provided in the attached files: issue_optimal_kernel_bandwidth.zip
Expected behavior The order of magnitude should be reasonable and if this is exceeded a warning should be raise.
Environment
- OS (e.g., Linux): Ubuntu
- How you installed elephant (
conda
,pip
, source): conda/pip - Python version: 3.9.7
-
neo
python package version: 0.10.2 -
elephant
python package version: 0.10.0
Thank you for reporting this and your effort to contribute to elephant.
I have extended your minimal example to illustrate the issue in connection with elephant.statistics.instantaneous_rate
.
import pickle
import quantities as pq
import elephant
import matplotlib.pyplot as plt
# Load example spiketrains
high_firing_rate_st, low_firing_rate_st = pickle.load(
open('spiketrain_examples.pkl', 'rb'))
# firing rates
print(high_firing_rate_st.annotations)
print(low_firing_rate_st.annotations)
{'mean_firing_rate': array(22.00288951) * 1/s}
{'mean_firing_rate': array(3.85978253) * 1/s}
# Look at kernel bandwidth estimation
# spiketrain = high_firing_rate_st
spiketrain = low_firing_rate_st
spiketimes = spiketrain.times.rescale(pq.s).magnitude
# Choose example spiketrain
kernel_bandwidth = elephant.statistics.optimal_kernel_bandwidth(spiketimes)
print(f'The optimal kernel bandwidth for the full length spiketrain was '
f'estimated to be {kernel_bandwidth["optw"]} s')
The optimal kernel bandwidth for the full length spiketrain was estimated to be 22.401845783799754 s
# Time slice the spiketrain to a 5 s duration
time_sliced_st = spiketrain.time_slice(t_start=1*pq.s, t_stop=6*pq.s)
spiketimes_sliced = time_sliced_st.times.rescale(pq.s).magnitude
kernel_bandwidth_sliced = elephant.statistics.optimal_kernel_bandwidth(
spiketimes_sliced)
print(f'The optimal kernel bandwidth for the full length spiketrain was '
f'estimated to be {kernel_bandwidth_sliced["optw"]} s')
The optimal kernel bandwidth for the full length spiketrain was estimated to be 0.5003558368563431 s
# Look at instantaneous rate
def plot_rate(inst_rate):
plt.plot(inst_rate.times.simplified.magnitude,
inst_rate.magnitude, color='orange',
label='instantaneous rate')
plt.legend()
plt.title(f"{inst_rate.annotations['kernel']['type']}: "
f"sigma={inst_rate.annotations['kernel']['sigma']}, "
f"sampling_period={int(sampling_period.item())} ms")
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.show()
sampling_period = 10 * pq.ms
# use kernel "auto"
rate_auto = elephant.statistics.instantaneous_rate(
spiketrain,
sampling_period=sampling_period,
kernel="auto")
# plot result
plot_rate(rate_auto)
Figure 1:
# calculate rate using kernel bandwidth based on time slice
kernel = elephant.kernels.GaussianKernel(
sigma=kernel_bandwidth_sliced["optw"] * pq.s)
rate_slice = elephant.statistics.instantaneous_rate(
spiketrain,
sampling_period=sampling_period,
kernel=kernel)
# plot result
plot_rate(rate_slice)
Figure 2:
The two figures above show the difference for large and small kernel bandwidths. If I understand correctly Figure 2 is the desired result for this case, Figure 1 shows a rate estimate, which is too smooth. (bandwidth too large)
possible solutions
- rise a warning to the user when
elephant.statistics.optimal_kernel_bandwidth(spiketimes)
is used and the resulting bandwidth is too large.- In order to rise this warning in
optimal_kernel_bandwidth
, the following question comes to mind:
Is there a way to estimate if the bandwidth is reasonable ? or would it be a fixed value ?
- In order to rise this warning in
- use the adaptive kernel functionality proposed in PR #462.
- change the hard coded number of bins (Lines 1576 to 1597) https://github.com/NeuralEnsemble/elephant/blob/67dd3f3ef86161c0e70faf6e50690bdfd591978c/elephant/statistics.py#L1576-L1597
I guess further steps should be discussed in connection to PR #462, thanks again for contributing this.
Hi Moritz,
thanks for having a look at it so quickly! Just to clarify a few points:
- What an expected or reasonable bandwidth should be, is a very valid question and I don't have a proper answer to that.
- I would consider a 500 ms kernel bandwidth still as quite large.
- The PR #462 does not solve the issue yet! It was just created while trying to solve it. (The hard-coded number of bins are definitely not ideal, but seem not to cause the main problem here)
- This method of estimating the instantaneous rate was, traditionally, applied to the spiketrains of sets of trials, which were short in duration. It might be a limitation of the method itself.
Having said that, I agree that we can move the discussion to PR #462 .