Allow import of NWB file with multiple acquisition "Raw" data objects
Although uncommon, some NWB files have multiple ElectricalSeries or other objects under /acquisition/raw. However, spyglass is currently limited to importing the NWB file only if there is a single object there.
@rly I do not understand this. In the data_import.insert_sessions, if the nwb file is a string, it is considered a single file. If it is a list, it is consider multiple files and each is read one at a time. See insert_sessions.py , line 24 - 27 (https://github.com/LorenFrankLab/spyglass/blob/master/src/spyglass/data_import/insert_sessions.py#L24).
If this is not a solution to your issue, can you give me an api call that illustrates where the issue you have happens..?
Bumping this bc I am running into the same issue for my project with the Jadhav Lab.
Raw only has Session in it's primary key, which prevents multiple entries per Session in the current table definition. Altering this table definition seems tricky seeing as it is the base for so many analyses. Brainstorming approaches:
- adding a primary key (like
eseries_number) toRaw. Not sure that would be able to implement in existing databases - adding a part table to
Rawthat could store information on additional eseries per session. Would need to think carefully about how to reference alternative eseries in existing downstream tables - In cases of multiple e-series, making a second
Sessionentry for the other eseries object (e.g.nwb_file_name = animal01012025_eseries2_.nwb). A little hacky, but avoids needing to change any core table structures
@CBroz1 any thoughts on practicality on any of these?
Adding a primary key
Our current assumption with table alteration updates to Spyglass is that changes can be made with DataJoint. Ideally, database admins of other instances would not need to run table alterations through MySQL. Altering primary keys has presumably been on their roadmap for 6y given the wording of the errors. This seems unlikely
Violating our assumption would mean providing a complex migration script, or altering a backup and declaring a new instance from it. That's a tough and risky road
Part table vs session-name convention
Deciding between these two comes down to a question of how multiple e-series are treated downstream. To what extent do they need to be processed as one (-> parts) vs independently and only compared later (-> convention)? I think I would want to chat more about this before making a recommendation, but I can lay out some pros/cons
Part table
Downstream tables now need to check for the presence of entries or maybe a boolean secondary key in the master table before.
- Pro: explicit relationship in the tables
- Con: significant dev to implement
Convention
Something in the primary key of the session name to highlight the relationship
- Pro: Simple to implement, delaying the burden to only needed cases
- Cons:
- Implicit knowledge, documentation burden
- May trigger #453, which can be a headache to navigate
Thanks Chris! Agree with your points above.
@pauladkisson , could you give a little more detail about the scenario for the multiple raw objects? Are these just different probes/devices that were each saved to separate raw acquisition objects in the nwb?
Thanks @samuelbray32 and @CBroz1 for looking into this.
In this current scenario, we have the ephys data split into multiple recording epochs (sleep - wake - behavior - etc), within the context of a daily session (grouped together for spike sorting, behavior, and behavior).
But the more general point is that NWB encourages users to put all their raw acquired data under acquisition (https://pynwb.readthedocs.io/en/stable/tutorials/general/plot_file.html) including ephys but also other modalities like video. In the current version of sgi.insert_sessions, any other object in acquisition will break the insertion, even if it's not an ElectricalSeries.
I see. if I understand correctly theres two largely separate issues here:
1. Multiple electrical series objects in nwb.acquisition
- In our lab, the standard has been to concatenate the timestamps and voltage measurements from multiple epochs into a single electrical series object (our conversion package). Separating periods for spikesorting/behavior analysis/ etc. is done using intervals in spyglass.
- In your case these are stored as separate objects.
- Given this I would lean towards the parts table approach in
Raw. That way we can try to converge the behavior of these two ways of organizing at a early table and not need to worry about the many downstream tables. - Todo:
- Make
Raw.AdditionalEseriespart table - Generalize the
Raw.make()function to add to this parts table if multiple objects - Change the fetch_nwb/fetch1_dataframe to return the concatenated parts
- Make
2. Non-electrical series objects in nwb.acquisition
-
Raw.make()already checks that the data is an electrical series, no worries that other data will end up inRaw - My understanding is that there's still a decent degree of variability in where groups organize data in their nwb, making full standardization of the ingestion tricky.
- We've put together a documentation table detailing where the
ingest_sessionsfunction is pulling data from in the nwb to hopefully help groups make nwbs in a compatible manner. - For existing nwb's with different organization, groups may need to make their own version of
ingest_sessionsspecific to their data organization - That said, if the errors originate from inside a table's
makefunction, let us know in a new issue and we can work to generalize them
1. Multiple electrical series objects in nwb.acquisition
In our lab, the standard has been to concatenate the timestamps and voltage measurements from multiple epochs into a single electrical series object
Although it's a bit clunky, I can relatively easily adjust the nwb conversion so that all of the epochs are stored in a single ElectricalSeries. Especially if the Part Table solution would be a significant dev effort, I think this is the easiest fix. And, in general, it should be feasible for any lab to concatenate all their raw ElectricalSeries objects together.
2. Non-electrical series objects in nwb.acquisition There are definitely going to be cases in which an nwbfile has multiple modalities in acquisition, which won't be able to all fit into a single ElectricalSeries. Now, one could move the non-ephys raw acquired data into a processing module, but that would specifically violate the best practices for acquisition vs processing: https://nwbinspector.readthedocs.io/en/dev/best_practices/nwbfile_metadata.html#file-organization.
Plus, it should be pretty easy (I think) to allow other objects in acquisition, all you would have to do is ignore any non-ephys objects, instead of asserting that the only object in acquisition is an ElectricalSeries: https://github.com/LorenFrankLab/spyglass/blob/dbae127f79c2a440dbaf3fd898e529c0e27906f2/src/spyglass/common/common_ephys.py#L304