pynwb icon indicating copy to clipboard operation
pynwb copied to clipboard

What should be default name of `pynwb.misc.Units`?

Open h-mayorquin opened this issue 1 year ago • 2 comments

Right now the following throws an error:

from pynwb.misc import Units
from pynwb.testing.mock.file import mock_NWBFile


nwbfile = mock_NWBFile()
nwbfile.units = Units()


ValueError: Field 'units' on NWBFile must be named 'units'."

Units default name is "Units" as supported by best practices for naming conventions:

assert Units().name == "Units"

And is defined here: https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/misc.py#L157-L158

However, the NWBFile has "units" as a required name for the attribute here on its fields:

https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/file.py#L272-L273

So I think that one of the two should give? Which one makes more sense? Maybe the nwbfile.units should accept "Units" as the name? Should it be the other way around?

h-mayorquin avatar Apr 01 '24 23:04 h-mayorquin

The name of the Units object stored at the NWBFile level of the file must be "units" according to the schema: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.file.yaml#L446-L449

This predates the best practices. Changing that to "Units" at the schema level would make existing NWB 2.0-2.7 files invalid to the 2.8+ schema, which is not ideal. We could modify the APIs to set NWBFile.units from the HDF5 group at /Units instead of /units. However, that might break existing software that does not use the APIs to read the units table (Neurosift may be one). It would also mean that the name of that group in HDF5 would be the only group that starts with a capital letter -- visually unappealing, but not a big deal.

Alternatively, we could remove the fixed name "units" and modify the APIs to set NWBFile.units to the only Units object in the root group, if present, whatever it is named. That would allow for heterogeneity in what the root file looks like when >95% of use cases will need only a single units table, and would also break existing software that does not use the APIs.

Anyway, most people don't care what the object is named under the hood, but I hesitate to change the current naming scheme because of other software relying on the existing schema.

Unfortunately, the default name of the Units type is inconsistent as you have discovered. The easiest fix is to change the default name of Units to units. Or leaving the behavior as is - that way, custom Units objects follow the new best practices and the only inconvenience comes from people replacing NWBFile.units with a custom Units object that is not named "units".

rly avatar Apr 02 '24 00:04 rly

Thanks for the full explanation @rly .

Unfortunately, the default name of the Units type is inconsistent as you have discovered. The easiest fix is to change the default name of Units to units.

This. I think we should change the default name for pynwb.misc.Units to be "units" so the code above works and we save the users some possible confusion. My feeling is that if someone is going to store the units table somewhere else other than nwbfile.units (e.g. a processing module) they also change the name to follow best practices. That said, I aknowledge the trade-off. There is a tension between best practices and backwards compatbility.

Independently on that I think we should change the newly added mock_units:

https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/testing/mock/ecephys.py#L125-L152

So these lines work as they should: https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/testing/mock/ecephys.py#L147-L150

I will do a PR for that.

h-mayorquin avatar Apr 02 '24 00:04 h-mayorquin