elephant
elephant copied to clipboard
Generalize cross correlation histogram function
Hi,
I was using the CCH function for some analysis and was very limited by the fact that it only accepted BinnedSpiketrains, because I also wanted to use it to compute CCH of non-spike data.
I went ahead and started modifying the function to enable using numpy arrays and encountered many many issues that I think made the code more difficult to understand and prone to unexpected behaviour. Here is the list of the things I found confusing and changed in this PR, we can roll some back if you still think they are worth having:
- A lot of the code is there to deal with cross correlating arrays of different shape. But I cannot think of any use case where anyone would want to do that. The neurons should have been recorded simultaneously and it makes no sense to me to correlate one time series with another after one of them has finished. I removed all functionality relating to this, which already simplified the code greatly.
- Another big issue is the use of spiketrains that have non-aligned t_start. I think the resulting statistics from spikes jumping around bins is not necessarily linearly predictable. The best practice here should be to slice the spiketrains before binning to the same t_start, hence I removed the functionality and included the strict criterion of having the same t_start.
- Why include a kernel function in this function? Seems like a postprocessing step that people should do to their result if they want to. I suggest removing it, if someone wants to use a kernel, they can do so with the output of the function.
- Same goes to binarize. This is a preprocessing step, and an option of BinnedSpikeTrain, the user should apply it before feeding the data to this function, seems ad hoc and dangerous to have this kind of functionality in multiple places.
- Finally, it was unclear to me what the border correction was doing. It would only kick in for the different length spiketrains case (which I don't allow anymore), so I removed it.
Stylistically, there was a class with helper functions. But the class was only instantiated once and the helper functions therein only used once. I see no point in scattering the code around in this way for functions that are only used once. Since all the options I removed have significantly shortened the code I put it all into the main function, hoping that it is clear what the code does at any given time without having to jump back and forth between helpers and the main function.
Also there were examples interleaved in the code comments, which made the code more confusing to me and I have never seen in any scientific programming language, so I also removed those. I added comments where I saw fit to clarify what certain code blocks are supposed to do.
I hope all these changes improve the usability and clarity of the cross_correlation_histogram function. I know you worked hard to include some of this functions, but to me they made the function more difficult to understand and use, while introducing a lot of uncertainty about what really was happening with some of the pre/post-processing steps.
Let me know what you think, I will work on the tests once the main function is agreed upon.
Probably of interest to @Kleinjohann
Best, Aitor
Hello @morales-gregorio! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2024-02-05 14:53:44 UTC
Hey Aitor,
thank you for contributing to Elephant.
This is what I understand from your description and my "first glance review":
-
Simplification of Code:
- Removed code related to cross-correlating arrays of different shapes, as this scenario is not expected.
- Eliminated functionality for handling spiketrains with non-aligned t_start, emphasizing the need for the same t_start.
- Suggested removing the kernel function and binarization, as they should be post-processing or preprocessing steps respectively.
-
Border Correction:
- Removed the border correction feature, as it only applied to different-length spiketrains (which are no longer allowed).
-
Code Organization:
- Integrated helper functions into the main function for improved code clarity.
- Removed interleaved examples from code comments for better readability.
- Added comments where needed to clarify code blocks.
I'll need some time to conduct a more thorough review of this PR, and, importantly, I believe it's essential for @Kleinjohann and @mdenker to also examine it.
Several of these points are not directly related to software development or Python programming; instead, they pose questions of a neuroscientific nature e.g. A lot of the code is there to deal with cross correlating arrays of different shape. But I cannot think of any use case where anyone would want to do that. Consequently, I'm hesitant to make comments or decisions regarding these particular aspects.
todo:
- [ ] change unit tests accordingly
Hi, any updates? I would wait for feedback and approval of changes before putting work on updating the unit tests
We discussed this in the meeting on 2024.02.05
- It is not clear if there is a good use case for differently shaped arrays.
- Numpy allows this in its function
- @mdenker will think about this use case and whether we really need it.
- There exists both
cross_correlation_histogram()
andcross_correlation_function()
; cohesive framework merging these two functions and possibly alsospike triggered average
. @mdenker supports merging these functions in the new elephant version.
When refactoring the cross-correlation functionalities, also inspect TSPE (Total Spiking Probability Estimation) implementations in elephant.functional_connectivity
.
More specifically in elephant/functional_connectivity_src/total_spiking_probability_edges.py
:
https://github.com/NeuralEnsemble/elephant/blob/6363690cece6d2e9f10342511db1eee87d3863a8/elephant/functional_connectivity_src/total_spiking_probability_edges.py#L174-L193