spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

NWB recording with multiple electrode groups (e.g. probes) has non-unique channel locations

Open khl02007 opened this issue 6 months ago • 15 comments

Right now when spikeinterface opens an NWB file as a si.Recording, the channel locations are defined by looking up rel_x etc columns in the electrodes table of the NWB file. But because rel_x etc is technically the relative coordinates within an electrode group, it can lead to non-unique channel locations. This will happen e.g. if the NWB file has electrical series from two probes that are of identical type but implanted in different locations and the electrode groups are defined at the level of probes. This seems bad to me but not sure what the solution is. Could use the x etc columns in electrode table, or looking up target locations that may be a property of electrode groups. Alternatively could return a list of recordings, one for each electrode group, rather than a single recording objects.

related discussion: https://github.com/LorenFrankLab/trodes_to_nwb/issues/120

khl02007 avatar Jun 18 '25 20:06 khl02007

@h-mayorquin maybe you know this better than I do?

zm711 avatar Jun 18 '25 20:06 zm711

Hey, @khl02007

As I think you are probably aware, these problems comes from the fact that there is not a good distinction between channels and probes table. This should be addressed with ndx-extracellular once we get some bandwidth to work on that. Meanwhile, we need to solve it. I have some ideas but first a question about the data:

Right now when spikeinterface opens an NWB file as a si.Recording, the channel locations are defined by looking up rel_x etc columns in the electrodes table of the NWB file. But because rel_x etc is technically the relative coordinates within an electrode group, it can lead to non-unique channel locations

When we load an ElectricalSeries as an extractor we use (or should use!) the electrodes attribute to check which rows of the ElectrodeTable should be added as a properties. Within the rows associated with a single ElectricalSeries it appears to me that rel_x and company should be unique. Can you tell me more about this to see if I am wrong or if there is some error in the metadata. How comes that the electrode rows associated with an ElectricalSeries have non-unique rel_x, rel_y and rel_z?

Now a general question: Do you think it would be bad if we were to build the probes by default from the absolute and not the relative locations? I think this will improve some of the cases as well as address the problem that some people sometimes write "x", "y", "Z" (absolute locations) but not relative (see here)

h-mayorquin avatar Jun 18 '25 20:06 h-mayorquin

@h-mayorquin thanks for the thoughtful response

in the nwb files that our lab generates, the rel_x, rel_y, and rel_z of each electrode are unique within the electrode group that it belongs to but may not be unique across electrode groups. So there might be multiple electrodes in the electrode table with relative location (0,0) but that belong to different electrode groups. this is a consequence of our decision to (i) interpret an electrode group as a probe and (ii) provide only relative locations for the rel_x etc columns within the electrode group instead of global absolute locations. we do include the target locations for each electrode group / probe as properties, so in theory one can combine this with the rel coordinates to get the absolute coordinates, but the si nwb extractor doesn't do this at the moment.

I'm a bit hesitant to suggest a solution because I think other NWB users could make different decisions. right now electrode_group can represent any grouping of electrodes (and need not be something natural like a shank or a probe). if you defined the electrode group as basically containing all electrodes regardless of whether they come from different probes, then we can build in the absolute locations as you say. using the x, y, z columns seems like a reasonable idea, though i'll note that we gotta make sure the same x y and z are used for both these columns and the rel columns (as you prob know, often the rel coordinates are relative to the plane of the probes, whereas the x, y, z may be something like the AP, ML, DV axis used during surgery).

hopefully this makes it clearer as to why I'm running into this issue. one practical consequence is that this prevents me from sorting all the channels at once. I could break them up into smaller groups based on probes / shanks though. and I do hope that ndx-extracellular happens sooner rather than later, as it seems the issue boils down to nonstandardization within nwb when it comes to specifying these data

khl02007 avatar Jun 18 '25 21:06 khl02007

in the nwb files that our lab generates, the rel_x, rel_y, and rel_z of each electrode are unique within the electrode group that it belongs to but may not be unique across electrode groups. So there might be multiple electrodes in the electrode table with relative location (0,0) but that belong to different electrode groups. this is a consequence of our decision to (i) interpret an electrode group as a probe and (ii) provide only relative locations for the rel_x etc columns within the electrode group instead of global absolute locations. we do include the target locations for each electrode group / probe as properties, so in theory one can combine this with the rel coordinates to get the absolute coordinates, but the si nwb extractor doesn't do this at the moment.

Thanks for the explanation. All of this makes sense. First, to focus on your problem at hand. I think the NWBRecordingExtractor should take an optional list of the electrode groups (by name) that can be passed to only select that part of the ElectricalSeries. If the list is empty, all the associated electrodes are used, otherwise, only the ones that match in name (which can be only one). What do you think?

Now, as I maintain neuroconv, I am curious on which pipeline and acquisition system you used to get to write your NWB files. Most of the time I separate one ElectricalSeries per probe so this is not a problem but this is a valid use of the spec as far as I am concerned. I am just asking so I know more about how people write their NWB files.

. right now electrode_group can represent any grouping of electrodes (and need not be something natural like a shank or a probe). if you defined the electrode group as basically containing all electrodes regardless of whether they come from different probes, then we can build in the absolute locations as you say.

Check this: https://github.com/NeurodataWithoutBorders/nwb-schema/issues/631

If you have anything to add it would be great.

i'll note that we gotta make sure the same x y and z are used for both these columns and the rel columns (as you prob know, often the rel coordinates are relative to the plane of the probes, whereas the x, y, z may be something like the AP, ML, DV axis used during surgery).

Yes, it is the second case that I am wondering about. Restricting ourselves to the 2D case. If we had AP, ML and DV coordinates and we built a probe from there, would sorting or other algorithms be affected by the case that the coordinates have an offset? My intuition is that it should work in most of the cases but I am not sure.

h-mayorquin avatar Jun 18 '25 21:06 h-mayorquin

Thanks for the explanation. All of this makes sense. First, to focus on your problem at hand. I think the NWBRecordingExtractor should take an optional list of the electrode groups (by name) that can be passed to only select that part of the ElectricalSeries. If the list is empty, all the associated electrodes are used, otherwise, only the ones that match in name (which can be only one). What do you think?

I think this is a good idea. At least that will return a recording object (or a list of recording objects) with channel locations that are unique. A minor suggestion: Would the user have to know the names of the electrode group in advance? is it possible to have a function that allows the user to discover this?

Now, as I maintain neuroconv, I am curious on which ](pipeline and acquisition system you used to get to write your NWB files. Most of the time I separate one ElectricalSeries per probe so this is not a problem but this is a valid use of the spec as far as I am concerned. I am just asking so I know more about how people write their NWB files.

We use trodes_to_nwb which is a package specifically for converting rec files generated by SpikeGadgets hardware to nwb. It uses a neurodata extension called nwb-franklab-novela for defining probes etc that are not currently specified by nwb schema. the nwb files then go into spyglass, our analysis pipeline (check out our manuscript here; we give a shout out to neuroconv). In writing the electrical series we concatenate all electrodes in all probes (we also concatenate all recording epochs across time), generating one large array. another thing that may be unique to us is that we use explicit timestamps instead of start time and sampling rate. the timestamps are unix times we get from ptp during acquisition. we do this because our recordings are long and we sometimes have dropped packets due to instabilities. using unix time also makes alignment across different modalities trivial.

Check this: https://github.com/NeurodataWithoutBorders/nwb-schema/issues/631 If you have anything to add it would be great.

sure will try to add something

Yes, it is the second case that I am wondering about. Restricting ourselves to the 2D case. If we had AP, ML and DV coordinates and we built a probe from there, would sorting or other algorithms be affected by the case that the coordinates have an offset? My intuition is that it should work in most of the cases but I am not sure.

I agree it would be fine. I'm just not sure if this should be done at the level of nwb file generation or spikeinterface nwb recording object. My guess is probably the latter.

khl02007 avatar Jun 19 '25 05:06 khl02007

Hi @khl02007!

I think that we should allows recordings to have the same channel locations as long as they are in different groups. Of course, we would then need checks at some stages (i.e., run sorter) to assess that channel locations are unique, hence forcing the user to spike sort by group. @h-mayorquin @zm711 @samuelgarcia what do you think?

alejoe91 avatar Jun 19 '25 13:06 alejoe91

As someone who uses a 3d probe but then sorts by group I've learned that our plotting functions + the gui can't handle this setup. So my concern is that if the positions are the same then we are turning the 2d->3d problem into an nd problem. So not only would be need solutions for sorting, but also for plotting and downstream in the gui. I think it would be great because the trick I've resorted to is to convert my 3d probe into a 2d offset probe (such that I sort by group to be accurate) but then for plotting and gui stuff I just display my third dimension as an offset second dimension.

zm711 avatar Jun 19 '25 13:06 zm711

As someone who uses a 3d probe but then sorts by group I've learned that our plotting functions + the gui can't handle this setup. So my concern is that if the positions are the same then we are turning the 2d->3d problem into an nd problem. So not only would be need solutions for sorting, but also for plotting and downstream in the gui. I think it would be great because the trick I've resorted to is to convert my 3d probe into a 2d offset probe (such that I sort by group to be accurate) but then for plotting and gui stuff I just display my third dimension as an offset second dimension.

Which is a valid solution! But I think that the problem here is that the recording cannot be loaded because setting the probe fails, and this is done directly by the NWB reader. Is that correct @khl02007 ?

alejoe91 avatar Jun 19 '25 13:06 alejoe91

I think my concern is that even at the loading stage to get the recording to work we need to decide whether to store this info as the same x, y but then force them to be in different groups (I,e, encode the z as a group) but then we break downstream spikeinterface stuff (other than sorting) or we enforce a conversion to 2d by allowing the user to specify an offset such that each group is put a distance of xx um away from each other so sorting can work by group but then plotting will work to. And this would need to happen when reading the nwb. (for me at least I construct my own probe so I'm not slowed down in loading my recording).

Or maybe the better option would be option with just groups but we redesign widgets to allow for plot by groups?

zm711 avatar Jun 19 '25 13:06 zm711

Yeah I think it could be a common use case. This is not really 3d, so I don't think that we need to use the third dimension

alejoe91 avatar Jun 19 '25 14:06 alejoe91

Ok, we discussed this in our maintainer meeting. Here are some points:

  1. For your data: are your rel_{value} unique per probe? If so, I think you can sort per probe and it should be OK, right? Otherwise, you can add a property called "probe" to the recording and then sort by that property using run_sorter_by_property. In terms of sorting, I think that should handle these type of cases.

  2. For SpikeInterface: we should add checks to ensure that within each probe, the locations are unique. Note that locations in SpikeInterface are the relative locations in NWB.

  3. We also discussed that NWB should build a probe, but after thinking more about it, I'm not so sure. The problem is that until we have ndx-extracellular, there's no clear way to map the NWB ontology to probes. As discussed here (https://github.com/NeurodataWithoutBorders/nwb-schema/issues/631), in NWB, an electrode group doesn't map directly to a probe: it could be something else (like a shank). So I don't think it makes a lot of sense, and, moreover, it doesn't seem strictly necessary. The locations and groupings are already present. And if they use neuroconv to convert their data, in most cases there should be a property called probe that they can use to sort by. I'm not sure what attaching a probe would add, except for complications like the ones discussed here: https://github.com/SpikeInterface/spikeinterface/issues/3918 and the cloning issue we've talked about before.

Are those all the threads? Let me know if I'm missing something.

h-mayorquin avatar Jun 19 '25 18:06 h-mayorquin

@alejoe91 no the recording loads, just with non-unique channel locations.

@h-mayorquin yeah we can sort per probe. I was just alerting you guys to this because it seems like an unexpected behavior and I was wondering if spikeinterface can provide some easy solutions. stuff like this makes working with nwb via spikeinterface (an increasingly common use case, or so I hope) clunky and difficult. I would also say that in the future we may do more than just spike sorting with spikeinterface and it would just be best if something basic like channel locations were unique.

I agree the problem is NWB ontology. I also wouldn't suggest that spikeinterface create its own probe type because that may not align with NWB in the future. what is holding up ndx-extracellular?

Having said that, supporting electrode group in nwb recording object may not be a bad idea since (i) electrode groups are officially recognized by nwb specs and (ii) are actually required during making of electrodes table right now, so every nwb files are guaranteed to have them. may not be a critical feature but could be convenient / informative.

khl02007 avatar Jun 19 '25 22:06 khl02007

I was just alerting you guys to this because it seems like an unexpected behavior and I was wondering if spikeinterface can provide some easy solutions. stuff like this makes working with nwb via spikeinterface (an increasingly common use case, or so I hope) clunky and difficult.

(bold mine)

I think this is a bit harsh. I’d like to improve the situation, but after thinking it through, I don’t see an easy solution.

I agree with @alejoe91 that we shouldn’t disallow workflows where locations are duplicated. Instead, I believe it's the responsibility of the algorithms that require unique locations to raise clear and informative errors. These messages could guide the user to either slice the channels using recording.channel_slice or use run_by_sorter.

Disallowing these workflows entirely, just for the sake of spike sorting convenience, seems a bit too restrictive, don’t you think?

That said, maybe I’m lacking imagination here. What are you envisioning? Are there ways we could improve this without blocking those workflows entirely?

h-mayorquin avatar Jun 19 '25 22:06 h-mayorquin

@h-mayorquin oh sorry, I take that back if it sounds harsh. I think what you said is reasonable and I dont have any better ideas. this isn't a big deal since there are workarounds as you say.

khl02007 avatar Jun 19 '25 23:06 khl02007

Is OK. But yeah, I am sure there are ways of making it better. I just don't want to avoid workflows that require non-unique locations (e.g. having a recording with a ProbeGroup attached where locations are unique inside the group but not outside of it).

h-mayorquin avatar Jun 19 '25 23:06 h-mayorquin