hdmf icon indicating copy to clipboard operation
hdmf copied to clipboard

[Feature]: Add an object path as a way to uniquely identify an object in the API

Open h-mayorquin opened this issue 9 months ago • 8 comments

It would be great to have something that can be used to specify an object within the nwbfile that is both unique and independent of the backend. An abstraction that can be used is that of paths so I am imaging an API that could look like this:

electrical_series = nwbfile.get_object_by_path("acquistion/ElectricalSeries")
electrical_series.get_api_path() == "acquistion/ElectricalSeries"

Use cases

In opposition to the object_id that uniquely specifies the object within the NWBFile the location can identify an object in an NWB that remains the same across different sessions. This can be used for:

  • Building configurations (e.g. chunking, compression, etc) that will apply to the same object in conversions even across different sessions.
  • Quickly accessing specific files for visualization or analysis for files with well known structure.

Previous or Similar Art

This function was implemented in neuroconv:

https://github.com/catalystneuro/neuroconv/blob/47a066ca8c58b88064bfecee90cfcfc70409d135/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py#L28-L44

And it produces output like this:

acquisition/TestDynamicTable/TestColumn/data
acquisition/NewTimeSeries/data
acquisition/TestElectricalSeries/data

Then the function was ported to pynwb:

https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/base.py#L290-L324

Complexities

The fact that hdf5 and zarr might have a different paths than the pynwb API can be confusing. An example that @rly pointed out is the electrical series.

Other considerations

  • There might be a better abstraction than a path to build unique identifiers?
  • I think it should be method and not an attribute because it might be costly to compute. I think functions indicate that better.
  • Streaming considerations, can we reduce the portion of the file visited when we are accessing the object by path or calculating paths?
  • How does it play with the idea of tagging instead of having a structure? it seems that a flat files with tags can make this redundant.
  • Is hdmf the place for this to live or is it better to have it in pynwb?

I probably missed some subtleties from today's discussion, so I am tagging people here so they can correct my mistake @rly @bendichter @CodyCBakerPhD

h-mayorquin avatar Apr 30 '24 17:04 h-mayorquin