spikeinterface
spikeinterface copied to clipboard
Analyzer core extension: improve docstrings
Also changed:
-
compute_select_random_spikes
tocompute_random_spikes
-
some_spikes()
function torandom_spikes()
Instead of random_spikes
what about sample_spikes
?
In line with these three big boys: https://numpy.org/doc/stable/reference/random/generated/numpy.random.sample.html https://docs.python.org/3/library/random.html#random.sample https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html
Instead of random_spikes what about sample_spikes?
What about the fact that sample can mean something very specific for ephys (ie the sample_rate)?
Either name works for me to be honest, but just curious what you think. Does consistency with other libraries make the most sense or trying to avoid using the same word in multiple different ways?
What about the fact that sample can mean something very specific for ephys (ie the sample_rate)?
Mmm I always think on function names as verbs so this connection did not make sense to me. Also, don't we usually use sampling
instead of sample
for the other concept?
I think that's a fair enough argument. Functions usually are verbs. (This library uses sampling, but in my head it is sample_rate--so maybe I'm just bringing my own experience to it :)
You mght be right. I think I expressed myself wrongly about. I meant that I did not see the connection before.
I like sample_spikes_indices
because:
- It is a verb (compared to random)
- It communicate that it returns indices (not spikes, times or anything else).
- The fact that is a sampling method makes naturally that we can also add sampling methods other than uniform as you see planned it here https://github.com/SpikeInterface/spikeinterface/blob/9d8f462e306c40a225a446c78cd45f89864b6472/src/spikeinterface/core/sorting_tools.py#L140
The last point is because people usually say random when they mean "uniform random distribution".
(In discrete cases people are indeed more careful and do say fair as in "consider a fair dice").
I guess you're raising the good point that random_spikes
(the public api) is relying on the fact that we use spike to mean the time (in samples or seconds) as well as the waveform, as well as the "object" that contains all of those attributes and in this case spike refers to the index to a "spike".
In reality random_spikes
is a function that is doing random_spike_indices
and then the private function is sample_spike_indices
for any function that relies on these. But I think random_spikes
is a better name even though it is less precise because at a public level we just want people to know they need to sample some spikes.
So this makes me wonder if it is better to have api consistency (we chose random_spikes
so we say sample_random_spikes
or we choose the more descriptive function name and say sample_spike_indices
.
- The fact that is a sampling method makes naturally that we can also add sampling methods other than uniform as you see planned it here
Strong point. I think you've convinced me on the merit of "sample" in this case. But I think the API consistency case could still go either way (to be clear I mean sample_random_spikes
vs sample_spike_indices
in my mind).
But I think random_spikes is a better name even though it is less precise because at a public level we just want people to know they need to sample some spikes.
I think I am missing something on this sentence. How does random_spikes
achieves this better than sample_spike_indices
?
I think for beginners in spike sorting they might actually struggle to understand the idea of spike_indices.
To be clear I’m saying the public API is sorting_analyzer.compute(‘random_spikes)
with the internal function for me would be sample_random_spikes
because we are sampling this pool of random spikes that the user requested.
Does that make sense? I agree that sample should be included in the name, but I think the public part of the api should be simple/even if using an overloaded term to make sense to beginning users.
Does that make sense? I agree that sample should be included in the name, but I think the public part of the api should be simple/even if using an overloaded term to make sense to beginning users.
OK, I think I get you. You are saying that, for beginners, being more explicit by including indices at the extension name might confuse more than clarify, correct? That could be...I need to think about it.
To be clear I’m saying the public API is sorting_analyzer.compute(‘random_spikes)
You writing it like this is useful because the word compute is already acting as a verb when used like this. How does it look as a parte of a bigger pipeline? If you do, sorting_analyzer.get_data("random_spikes")
, what does it* return?
(*) I know this might not exist yet but I know something is on the works.
OK, I think I get you. You are saying that, for beginners, being more explicit by including indices at the extension name might confuse more than clarify, correct? That could be...I need to think about it
Exactly!
Yeah the current workflow would be something like
# create analyzer...
random_spikes_ext = analyzer.compute('random_spikes', kwargs)
random_spikes = random_spike_ext.get_data()
# or what I actually do
analyzer.compute('random_spikes', kwargs)
random_spikes = analyzer.get_extension('random_spikes').get_data()
basically each Extension is created using the compute
. so for example analyzer.compute('spike_amplitudes')
or analyzer.compute('quality_metrics')
which will return the SortingAnalyzer.Extension
. And then there is a get_data
that only works on that Extension. Sam was saying he wanted to add a get_extension_data
to combine get_extension and get_data
so something like:
random_spikes = analyzer.get_extension_data('random_spikes')
So for another example
analyzer.compute(['random_spikes', 'waveforms', 'templates', 'noise_levels', 'spike_amplitudes', 'spike_locations']).
This uses default arguments and just iterates through and calculates each one, but because I'm doing multiple it returns None instead so the only way to get the data is by get_extension
+ get_data
.
You could also do something like:
analyzer.compute(dict(random_spikes=dict( max_number_of_spikes=300)), principal_components=dict(n_components=3)))
Basically compute is our verb and then we tell what (so the object/noun) that we want to compute.
I agree with @zm711 that spike_indices
is an "advanced" concept which will make the API over complicated. Fine to use it internally, but at the higher level compute_random_spikes
or sample_random_spikes
(it doesn't need to be "compute") would be a better choice.
@samuelgarcia what do you think of sample_random_spikes
for the function? The extension will still be called random_spikes
ciao camarade.
thanks for the doc update.
as you guess by the comments the random_spike()
do not make me happy. lets find another better name