spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

compute_synchrony_metrics update

Open chrishalcrow opened this issue 11 months ago • 5 comments

(Another go of #2528, after the SortingAnalyzer refactor)

Planning to standardise, correctness test and optimise the quality metrics module. First one: the compute_synchrony_metrics function.

Two main things in this:

  • The generating function synthesize_random_firings and synthesize_poisson_spike_vectors were adjusted so that you can insert specific spikes, useful for testing. These are used in the new tests.
  • A new function compute_synchrony_counts is introduced, which computes the synchrony counts.

One thing I'm not sure about: The documentation said that compute_synchrony_metrics returns a dict, but I think we should export a namedtuple, which matches most other quality metrics.

I adjusted the algorithm in compute_synchrony_counts, which should be easier to understand and is significantly faster. Here are some benchmarking results, with times and create a log-log plot to check the algorithm's complexity. About 30x faster!

benchmarch_sync

Benchmarked using the following code:

import numpy as np
from time import perf_counter

from spikeinterface.core import synthesize_random_firings
from spikeinterface.qualitymetrics import get_synchrony_counts

unit_ids = np.arange(100)

Ns = 10**np.array(np.arange(3,5.1,0.1))

times_by_size=[]
how_many = 100

for train_length in Ns:

    times_list = []

    for _ in range(0,how_many):

        spike_train = synthesize_random_firings(num_units=100, duration=train_length/1000, firing_rates=10.0)

        start = perf_counter()
        get_synchrony_counts(spike_train, np.array((2,4,8)), list(unit_ids))        
        end = perf_counter()

        times_list.append(end - start)

    times_by_size.append( np.median( times_list ) )

chrishalcrow avatar Mar 20 '24 10:03 chrishalcrow

Thanks @chrishalcrow

Can you heck why tests are failing?

alejoe91 avatar Mar 20 '24 14:03 alejoe91

Thanks @chrishalcrow

Can you heck why tests are failing?

Yes, it's to do with me originally changing the output from namedtuple to a dict (as documented). I'm changing back now...

chrishalcrow avatar Mar 20 '24 15:03 chrishalcrow

Tip: You have https://numpy.org/doc/stable/reference/generated/numpy.logspace.html to generate your samples.

Benchamark comments:

  • I think you should fix your seed in your benchmark explictly.
  • I find the spike_length variable hard to think about. I suggest using time and show us the units. Does it scale well to the relevant domain of hour long sortings?
  • Finally, I would be curios to see how this scales with the number of units as well.

h-mayorquin avatar Mar 20 '24 17:03 h-mayorquin

Tip: You have https://numpy.org/doc/stable/reference/generated/numpy.logspace.html to generate your samples.

Benchamark comments:

  • I think you should fix your seed in your benchmark explictly.
  • I find the spike_length variable hard to think about. I suggest using time and show us the units. Does it scale well to the relevant domain of hour long sortings?
  • Finally, I would be curios to see how this scales with the number of units as well.

Oooh, thanks for the logspace function.

Maybe it would be helpful to say: the new algorithm is expected to be faster. I was able to replace the use of np.complexity, isin and count_nonzero (I think complexity is the heavy function of these) with some simple selections (like unique_spike_index[counts >= 2]).

Spike_length is just the number of spikes in the spike vector. So for generic data, this increases linearly with time.

I’ve been using a 30 minute 16 channel recording from my lab, which contains 70,000 spikes after sorting. Is this realistic? I’ve also been using this to make sure my algorithm is consistent with the old results. My tests go up to 100,000 spikes. What would a good number to go up to?

I'm testing with unit number now, going up to 1000 units. Is that high enough? The complexity function changes it’s method at some point, and I’m trying to figure out how and when and will report back soon! But the new code is still substantially faster at e.g. 1k units with 100k spikes.

chrishalcrow avatar Mar 21 '24 14:03 chrishalcrow

Spike_length is just the number of spikes in the spike vector. So for generic data, this increases linearly with time.

On a second thought, this seems more amenable to control and more relevant than duration. It is just probably me that does not have a good intuitive grasp of this metric while duration I understand better. How are you controlling this though, given that you are passing this parameter in duration divided by 1000?

I'm testing with unit number now, going up to 1000 units. Is that high enough? The complexity function changes it’s method at some point, and I’m trying to figure out how and when and will report back soon! But the new code is still substantially faster at e.g. 1k units with 100k spikes.

1000 I have been told than 1000 units is like a very high number.

Bear in mind that the comments I made about the benchmark is just me being curios and suggestions. As long as Sam and Alessio are OK with the rest I would not bother if you don't care about it. The thing that I really feel strongly about is for not changing the signature and output of synthesize_poisson_spike_vector.

h-mayorquin avatar Mar 21 '24 20:03 h-mayorquin

Spike_length is just the number of spikes in the spike vector. So for generic data, this increases linearly with time.

On a second thought, this seems more amenable to control and more relevant than duration. It is just probably me that does not have a good intuitive grasp of this metric while duration I understand better. How are you controlling this though, given that you are passing this parameter in duration divided by 1000?

I'm testing with unit number now, going up to 1000 units. Is that high enough? The complexity function changes it’s method at some point, and I’m trying to figure out how and when and will report back soon! But the new code is still substantially faster at e.g. 1k units with 100k spikes.

1000 I have been told than 1000 units is like a very high number.

Bear in mind that the comments I made about the benchmark is just me being curios and suggestions. As long as Sam and Alessio are OK with the rest I would not bother if you don't care about it. The thing that I really feel strongly about is for not changing the signature and output of synthesize_poisson_spike_vector.

Hello,

I learned a few things from your comments and trying to recreate my previous benchmarking. Thanks! Before: I refactored the code, creating a new function compute_synchrony_count. Then I did a speed test on the new function using the old algorithm against an optimised version of it. This is hard for others to reproduce since you can’t easily run compute_synchrony_count. So now I’m benchmarking against the full compute_synchrony_metrics function. This should be closer to “real world” use than before and easier for anyone to check. And I’ve hopefully made the wording/details of the plots more obvious.

The old benchmarking results are basically unchanged.

And I’ve now done more benchmarking, with changing units. First, here’s a plot of timings with increasing units N but fixed spike train length (of about 50k: in our lab, this is the number of spikes in a typical 20 minute recording). The new algorithm grows linearly with the number of spikes. The old algorithm, which uses more complex functions, changes tactic at ~600 units (I think it’s np.unique that changes its tactic…). Both its tactics look like O(N) algorithms. But even when it changes, it is still about 10x slower than the new one.

benchmarch_sync_N

I also tried keeping the duration and firing rates fixed. So here, as the number of units N increases the length of the spike train increases at the same rate. But the density of spikes per time unit stays constant. Both algorithms grow as N^2, as expected (since units and train length are both increasing with N). Again, the old algorithm changes tactic but the new code is still significantly faster.

benchmarch_sync_N_sq

The code for both is below:

times_by_size=[]
sizes = []

how_many = 100

duration = 10**1.7
# duration = 10**1.5 for the first plot
nums_units = np.arange(100,1201,100)

for num_units in nums_units:

    recording, sorting = generate_ground_truth_recording(
            durations=[
               # duration # for the first plot
               # duration/num_units*nums_units[0] # for the second plot
            ],
            sampling_frequency=30_000.0,
            num_channels=6,
            num_units=num_units,
            generate_sorting_kwargs=dict(firing_rates=10.0, refractory_period_ms=4.0),
            noise_kwargs=dict(noise_level=5.0, strategy="tile_pregenerated"),
            seed=2205,
        )

    sorting_analyzer = create_sorting_analyzer(sorting, recording, format="memory", sparse=False)

    times_list = []
    for _ in range(0,how_many):

        start = perf_counter()
        compute_synchrony_metrics(sorting_analyzer)        
        end = perf_counter()

        times_list.append(end - start)

    times_by_size.append( np.median( times_list ) )
    sizes.append( len(sorting_analyzer.sorting.to_spike_vector()) )

chrishalcrow avatar Mar 26 '24 09:03 chrishalcrow

Unfortunately, I can't review the synchrony metrics changes as I am not really well acquainted with the method. Maybe @DradeAW or @zm711 can take a look?

@alejoe91 actually ported this from Elephant so he would be the best to review unless @DradeAW knows this super well. I never actually use this metric :)

zm711 avatar Mar 26 '24 17:03 zm711

Unfortunately, I can't review the synchrony metrics changes as I am not really well acquainted with the method. Maybe @DradeAW or @zm711 can take a look?

@alejoe91 actually ported this from Elephant so he would be the best to review unless @DradeAW knows this super well. I never actually use this metric :)

An important part of the update are the four test_synchrony_counts_blah in test_metrics_functions.py . These are correctness tests for the main part of the algorithm. I make a small spike_train with certain spikes (e.g. two spikes which fire at the same time) and make sure the metric gives the right answer. If anyone can think of any good edge cases to test here, please comment!

chrishalcrow avatar Mar 27 '24 15:03 chrishalcrow

Hello, realised that the _add_spikes_to_spiketrain function was basically just checking a bunch of array lengths then concatenating some arrays together, that were then used for testing. So it wasn't really adding much, and it was actually much simpler to get rid of it and concatenate in the tests directly. Got rid of 200 lines of code, and simplified the pull request a lot!

chrishalcrow avatar Apr 15 '24 12:04 chrishalcrow

Hi. When we will have time I would be happy to discuss this function. The synchrony here if I understand correctly is the exact same timestamp but we could have synchrony with very small lags. A lag can be due to small noise and small time shift. So my main question is how robut is that mertics regarding theses lags ?

samuelgarcia avatar Apr 22 '24 18:04 samuelgarcia

Hi. When we will have time I would be happy to discuss this function. The synchrony here if I understand correctly is the exact same timestamp but we could have synchrony with very small lags. A lag can be due to small noise and small time shift. So my main question is how robut is that mertics regarding theses lags ?

Hello, indeed the metric is not robust to lags at all. This PR was just optimizing what already existed. I'm happy to add a delta, which would check if there were spikes within delta samples of the first one?

chrishalcrow avatar Apr 23 '24 07:04 chrishalcrow

Let's keep it like this for now, since this was the original design. :)

alejoe91 avatar Apr 23 '24 07:04 alejoe91