spyglass icon indicating copy to clipboard operation
spyglass copied to clipboard

Allow import of NWB file with multiple acquisition "Raw" data objects

Open rly opened this issue 3 years ago • 8 comments

Although uncommon, some NWB files have multiple ElectricalSeries or other objects under /acquisition/raw. However, spyglass is currently limited to importing the NWB file only if there is a single object there.

rly avatar Jan 26 '23 00:01 rly

@rly I do not understand this. In the data_import.insert_sessions, if the nwb file is a string, it is considered a single file. If it is a list, it is consider multiple files and each is read one at a time. See insert_sessions.py , line 24 - 27 (https://github.com/LorenFrankLab/spyglass/blob/master/src/spyglass/data_import/insert_sessions.py#L24).

If this is not a solution to your issue, can you give me an api call that illustrates where the issue you have happens..?

zoldello avatar Mar 09 '23 05:03 zoldello

Bumping this bc I am running into the same issue for my project with the Jadhav Lab.

pauladkisson avatar Jan 07 '25 01:01 pauladkisson

Raw only has Session in it's primary key, which prevents multiple entries per Session in the current table definition. Altering this table definition seems tricky seeing as it is the base for so many analyses. Brainstorming approaches:

  • adding a primary key (like eseries_number) to Raw. Not sure that would be able to implement in existing databases
  • adding a part table to Raw that could store information on additional eseries per session. Would need to think carefully about how to reference alternative eseries in existing downstream tables
  • In cases of multiple e-series, making a second Session entry for the other eseries object (e.g. nwb_file_name = animal01012025_eseries2_.nwb). A little hacky, but avoids needing to change any core table structures

@CBroz1 any thoughts on practicality on any of these?

samuelbray32 avatar Feb 03 '25 15:02 samuelbray32

Adding a primary key

Our current assumption with table alteration updates to Spyglass is that changes can be made with DataJoint. Ideally, database admins of other instances would not need to run table alterations through MySQL. Altering primary keys has presumably been on their roadmap for 6y given the wording of the errors. This seems unlikely

Violating our assumption would mean providing a complex migration script, or altering a backup and declaring a new instance from it. That's a tough and risky road

Part table vs session-name convention

Deciding between these two comes down to a question of how multiple e-series are treated downstream. To what extent do they need to be processed as one (-> parts) vs independently and only compared later (-> convention)? I think I would want to chat more about this before making a recommendation, but I can lay out some pros/cons

Part table

Downstream tables now need to check for the presence of entries or maybe a boolean secondary key in the master table before.

  • Pro: explicit relationship in the tables
  • Con: significant dev to implement

Convention

Something in the primary key of the session name to highlight the relationship

  • Pro: Simple to implement, delaying the burden to only needed cases
  • Cons:
    • Implicit knowledge, documentation burden
    • May trigger #453, which can be a headache to navigate

CBroz1 avatar Feb 03 '25 16:02 CBroz1

Thanks Chris! Agree with your points above.

@pauladkisson , could you give a little more detail about the scenario for the multiple raw objects? Are these just different probes/devices that were each saved to separate raw acquisition objects in the nwb?

samuelbray32 avatar Feb 03 '25 16:02 samuelbray32

Thanks @samuelbray32 and @CBroz1 for looking into this.

In this current scenario, we have the ephys data split into multiple recording epochs (sleep - wake - behavior - etc), within the context of a daily session (grouped together for spike sorting, behavior, and behavior).

But the more general point is that NWB encourages users to put all their raw acquired data under acquisition (https://pynwb.readthedocs.io/en/stable/tutorials/general/plot_file.html) including ephys but also other modalities like video. In the current version of sgi.insert_sessions, any other object in acquisition will break the insertion, even if it's not an ElectricalSeries.

pauladkisson avatar Feb 03 '25 18:02 pauladkisson

I see. if I understand correctly theres two largely separate issues here:

1. Multiple electrical series objects in nwb.acquisition

  • In our lab, the standard has been to concatenate the timestamps and voltage measurements from multiple epochs into a single electrical series object (our conversion package). Separating periods for spikesorting/behavior analysis/ etc. is done using intervals in spyglass.
  • In your case these are stored as separate objects.
  • Given this I would lean towards the parts table approach in Raw. That way we can try to converge the behavior of these two ways of organizing at a early table and not need to worry about the many downstream tables.
  • Todo:
    • Make Raw.AdditionalEseries part table
    • Generalize the Raw.make() function to add to this parts table if multiple objects
    • Change the fetch_nwb/fetch1_dataframe to return the concatenated parts

2. Non-electrical series objects in nwb.acquisition

  • Raw.make() already checks that the data is an electrical series, no worries that other data will end up in Raw
  • My understanding is that there's still a decent degree of variability in where groups organize data in their nwb, making full standardization of the ingestion tricky.
  • We've put together a documentation table detailing where the ingest_sessions function is pulling data from in the nwb to hopefully help groups make nwbs in a compatible manner.
  • For existing nwb's with different organization, groups may need to make their own version of ingest_sessions specific to their data organization
  • That said, if the errors originate from inside a table's make function, let us know in a new issue and we can work to generalize them

samuelbray32 avatar Feb 03 '25 22:02 samuelbray32

1. Multiple electrical series objects in nwb.acquisition

In our lab, the standard has been to concatenate the timestamps and voltage measurements from multiple epochs into a single electrical series object

Although it's a bit clunky, I can relatively easily adjust the nwb conversion so that all of the epochs are stored in a single ElectricalSeries. Especially if the Part Table solution would be a significant dev effort, I think this is the easiest fix. And, in general, it should be feasible for any lab to concatenate all their raw ElectricalSeries objects together.

2. Non-electrical series objects in nwb.acquisition There are definitely going to be cases in which an nwbfile has multiple modalities in acquisition, which won't be able to all fit into a single ElectricalSeries. Now, one could move the non-ephys raw acquired data into a processing module, but that would specifically violate the best practices for acquisition vs processing: https://nwbinspector.readthedocs.io/en/dev/best_practices/nwbfile_metadata.html#file-organization.

Plus, it should be pretty easy (I think) to allow other objects in acquisition, all you would have to do is ignore any non-ephys objects, instead of asserting that the only object in acquisition is an ElectricalSeries: https://github.com/LorenFrankLab/spyglass/blob/dbae127f79c2a440dbaf3fd898e529c0e27906f2/src/spyglass/common/common_ephys.py#L304

pauladkisson avatar Feb 04 '25 01:02 pauladkisson