spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

Analyzer core extension: improve docstrings

Open alejoe91 opened this issue 11 months ago • 12 comments

Also changed:

  • compute_select_random_spikes to compute_random_spikes
  • some_spikes() function to random_spikes()

alejoe91 avatar Mar 20 '24 17:03 alejoe91

Instead of random_spikes what about sample_spikes?

In line with these three big boys: https://numpy.org/doc/stable/reference/random/generated/numpy.random.sample.html https://docs.python.org/3/library/random.html#random.sample https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html

h-mayorquin avatar Mar 20 '24 19:03 h-mayorquin

Instead of random_spikes what about sample_spikes?

What about the fact that sample can mean something very specific for ephys (ie the sample_rate)?

Either name works for me to be honest, but just curious what you think. Does consistency with other libraries make the most sense or trying to avoid using the same word in multiple different ways?

zm711 avatar Mar 20 '24 19:03 zm711

What about the fact that sample can mean something very specific for ephys (ie the sample_rate)?

Mmm I always think on function names as verbs so this connection did not make sense to me. Also, don't we usually use sampling instead of sample for the other concept?

h-mayorquin avatar Mar 20 '24 19:03 h-mayorquin

I think that's a fair enough argument. Functions usually are verbs. (This library uses sampling, but in my head it is sample_rate--so maybe I'm just bringing my own experience to it :)

zm711 avatar Mar 20 '24 20:03 zm711

You mght be right. I think I expressed myself wrongly about. I meant that I did not see the connection before.

I like sample_spikes_indices because:

  1. It is a verb (compared to random)
  2. It communicate that it returns indices (not spikes, times or anything else).
  3. The fact that is a sampling method makes naturally that we can also add sampling methods other than uniform as you see planned it here https://github.com/SpikeInterface/spikeinterface/blob/9d8f462e306c40a225a446c78cd45f89864b6472/src/spikeinterface/core/sorting_tools.py#L140

The last point is because people usually say random when they mean "uniform random distribution".

(In discrete cases people are indeed more careful and do say fair as in "consider a fair dice").

h-mayorquin avatar Mar 20 '24 20:03 h-mayorquin

I guess you're raising the good point that random_spikes (the public api) is relying on the fact that we use spike to mean the time (in samples or seconds) as well as the waveform, as well as the "object" that contains all of those attributes and in this case spike refers to the index to a "spike".

In reality random_spikes is a function that is doing random_spike_indices and then the private function is sample_spike_indices for any function that relies on these. But I think random_spikes is a better name even though it is less precise because at a public level we just want people to know they need to sample some spikes.

So this makes me wonder if it is better to have api consistency (we chose random_spikes so we say sample_random_spikes or we choose the more descriptive function name and say sample_spike_indices.

  1. The fact that is a sampling method makes naturally that we can also add sampling methods other than uniform as you see planned it here

Strong point. I think you've convinced me on the merit of "sample" in this case. But I think the API consistency case could still go either way (to be clear I mean sample_random_spikes vs sample_spike_indices in my mind).

zm711 avatar Mar 20 '24 21:03 zm711

But I think random_spikes is a better name even though it is less precise because at a public level we just want people to know they need to sample some spikes.

I think I am missing something on this sentence. How does random_spikes achieves this better than sample_spike_indices?

h-mayorquin avatar Mar 20 '24 21:03 h-mayorquin

I think for beginners in spike sorting they might actually struggle to understand the idea of spike_indices.

To be clear I’m saying the public API is sorting_analyzer.compute(‘random_spikes) with the internal function for me would be sample_random_spikes because we are sampling this pool of random spikes that the user requested.

Does that make sense? I agree that sample should be included in the name, but I think the public part of the api should be simple/even if using an overloaded term to make sense to beginning users.

zm711 avatar Mar 20 '24 22:03 zm711

Does that make sense? I agree that sample should be included in the name, but I think the public part of the api should be simple/even if using an overloaded term to make sense to beginning users.

OK, I think I get you. You are saying that, for beginners, being more explicit by including indices at the extension name might confuse more than clarify, correct? That could be...I need to think about it.

To be clear I’m saying the public API is sorting_analyzer.compute(‘random_spikes)

You writing it like this is useful because the word compute is already acting as a verb when used like this. How does it look as a parte of a bigger pipeline? If you do, sorting_analyzer.get_data("random_spikes"), what does it* return?

(*) I know this might not exist yet but I know something is on the works.

h-mayorquin avatar Mar 20 '24 22:03 h-mayorquin

OK, I think I get you. You are saying that, for beginners, being more explicit by including indices at the extension name might confuse more than clarify, correct? That could be...I need to think about it

Exactly!

Yeah the current workflow would be something like

# create analyzer...
random_spikes_ext = analyzer.compute('random_spikes', kwargs)
random_spikes = random_spike_ext.get_data()
# or what I actually do
analyzer.compute('random_spikes', kwargs)
random_spikes = analyzer.get_extension('random_spikes').get_data()

basically each Extension is created using the compute. so for example analyzer.compute('spike_amplitudes') or analyzer.compute('quality_metrics') which will return the SortingAnalyzer.Extension. And then there is a get_data that only works on that Extension. Sam was saying he wanted to add a get_extension_data to combine get_extension and get_data

so something like:

random_spikes = analyzer.get_extension_data('random_spikes')

So for another example

analyzer.compute(['random_spikes', 'waveforms', 'templates', 'noise_levels', 'spike_amplitudes', 'spike_locations']).

This uses default arguments and just iterates through and calculates each one, but because I'm doing multiple it returns None instead so the only way to get the data is by get_extension + get_data.

You could also do something like:

analyzer.compute(dict(random_spikes=dict( max_number_of_spikes=300)), principal_components=dict(n_components=3)))

Basically compute is our verb and then we tell what (so the object/noun) that we want to compute.

zm711 avatar Mar 20 '24 22:03 zm711

I agree with @zm711 that spike_indices is an "advanced" concept which will make the API over complicated. Fine to use it internally, but at the higher level compute_random_spikes or sample_random_spikes (it doesn't need to be "compute") would be a better choice.

@samuelgarcia what do you think of sample_random_spikes for the function? The extension will still be called random_spikes

alejoe91 avatar Mar 21 '24 07:03 alejoe91

ciao camarade. thanks for the doc update. as you guess by the comments the random_spike() do not make me happy. lets find another better name

samuelgarcia avatar Mar 21 '24 08:03 samuelgarcia