spikeinterface WIP : get additional data from HerdingspikesSortingExtractor

It seems there already was some attempt to extract the additional info with load_unit_info a while ago.

For now, this PR can extract the location of a unit with get_unit_location, as you would with get_unit_spike_train. The unit locations are loaded by default, as I don't think the extra memory and computation costs should be very significant.

The rest of the data that HerdingSpikes provides is per spike :

Amplitude
Waveforms (possibly redundant ?)
Channel index
Unit index
x and y coordinates

This should be quite a bit more memory intensive to retrieve, and I'm not entirely sure of the use case for it. Nevertheless, I can see that it was considered as a possibility in the code that was already there.

Any feedback would be appreciated !

Nov 07 '24 13:11 b-grimaud

Thank you! I'll take a look asap.

Nov 07 '24 13:11 mhhennig

thanks for this. I think we should have more discussion about the API. If the additional data is at unit level we just could add some properties in the sorting object this would make more sens. If the additional data is at spike level we would need an additional function at API level not do this extractor per extractor.

Nov 08 '24 08:11 samuelgarcia

If the additional data is at spike level we would need an additional function at API level not do this extractor per extractor.

This is what I was wondering about. The easiest solutions for now would be to return matching arrays per unit, but for the sake of reusability I guess the BaseSorting class would need to be able to handle arbitrary per-spike properties.

In the case of HerdingSpikes, a lot of useful data is already computed or extracted : waveforms, locations and amplitudes are already included, PCA is also computed but not included in the output. All of this is then recomputed by the sorting analyzer. Then it would be up to individual sorting extractors to match the expected data structure.

Nov 08 '24 14:11 b-grimaud

On the data HS extracts:

Amplitude: This is not the real amplitude, but a re-scaled version. I don't think these are useful.
Waveforms: These are peak channel waveforms, internally cached memmapped for quick PCA computation and can be written out into the hdf5 file the extractor reads.
Channel index: Useful as these are peak channels.
Unit index: Internal use.
x and y coordinates: COM based estimates, very useful as they are very quick to estimate.

I feel peak channel and x/y locations could be put into a SortingAnayzer object, that's where they would normally be found. Would this be possible?

Nov 12 '24 18:11 mhhennig

I agree with Sam that the unit properties should be written as such. I would be curios about how to handle spike properties if you have them so tagging along here.

Nov 12 '24 20:11 h-mayorquin

If this is too much for now, I think the PR can stay as is and just expose unit locations as a property.

Regarding individual spikes, being able to load arbitrary properties on top of spike times might require changes deep in the structure. A simpler solution might be for individual extractors to specify "pre-computed" extensions that are then loaded when the sorting analyzer is created.

If the changes can be mostly contained at the extractor level, I think it could avoid some extra abstractions at the sorter level ?

This might not be relevant for most use-cases but for instance, computing waveforms on BioCAM data takes about 30 minutes with 100 spikes per unit, and the binary weighs about 180 GB. Meanwhile, the data is already on the HDF5 file read by the extractor, with a total size of less than 200 MB (same dtype, roughly the same cutout times).

Dec 17 '24 17:12 b-grimaud

Regarding individual spikes, being able to load arbitrary properties on top of spike times might require changes deep in the structure. A simpler solution might be for individual extractors to specify "pre-computed" extensions that are then loaded when the sorting analyzer is created.

One of the major goals of spikeinterface is creating reproducible pipelines. If we allow the extensions to be loaded then we can't guarantee/ensure that they could be reproduced in the analyzer. The counter argument is that we support loading some phy structures, but in that case we don't accept the PCs from phy or the amplitudes we force the user to calculate them.

Mar 10 '25 12:03 zm711

One of the major goals of spikeinterface is creating reproducible pipelines. If we allow the extensions to be loaded then we can't guarantee/ensure that they could be reproduced in the analyzer.

Makes sense !

In this case, would this conflict with the "compute spike localizations" extension, even if it's at the unit level ?

Mar 20 '25 10:03 b-grimaud

In this case, would this conflict with the "compute spike localizations" extension, even if it's at the unit level ?

We have a unit_locations extension that I believe this would conflict with (I don't use herdingspikes so I don't know exactly what this is like) I guess if you wanted this as an annotation, but this seems heavy for an annotations?

Mar 21 '25 12:03 zm711

I'm not too familiar with annotations, but with HS2 the locations are just a pair of floats per unit.

I'll try to see if I can access units from both the HDF5 and the sorting object, in which case this could be done externally and not conflict with any extensions.

In this case, it would probably be best to close the PR as it wouldn't be of much use.

Mar 25 '25 16:03 b-grimaud

@b-grimaud

I think you could just add a property to the sorting object called hs_unit_location. If this is usefukt to you, I think it wouldn't hurt adding it!

Mar 26 '25 10:03 alejoe91

I think you could just add a property to the sorting object called hs_unit_location

As is, the resulting sorter object has a property called unit_locations that returns an array of shape (n_units, 2) with the x and y coordinates of the corresponding units.

It might indeed be a bit simpler than calling a function like for spike trains.

Mar 27 '25 17:03 b-grimaud

spikeinterface spikeinterface copied to clipboard

WIP : get additional data from HerdingspikesSortingExtractor

spikeinterface
spikeinterface copied to clipboard