pynwb
pynwb copied to clipboard
Make NWBFile identifier optional
2) Feature Request
As per our discussion at the hackathon re: session_id and identifier of NWBFile, identifier should no longer be required. session_id should be highly encouraged, but in order to maintain backwards compatibility, I think we should not make it required.
Is the idea to auto-generate identifier when it is not provided?
An NWBFile (and all other neurodata_types) now have practically unique object IDs, which can serve the purpose of identifying an NWBFile and its contents*. As such, I think we can make identifier optional now and not auto-generate it.
Making identifier optional is fine. I think we may want to update the doc for the identifier field as well. Currently the doc discusses an example on how to construct the identifier. Instead, I think we should just say that is should be unique, and use DOI as an example.
Note that making identifier optional in the API will break backwards compatibility of the API. i.e. calls such as:
nwbfile = NWBFile('description, 'id', start_time)
will no longer work. Because identifier will have a default value and session_start_time does not, the positional arguments of __init__ will be parsed as: session_description, session_start_time and then identifier (if I just make identifier the first keyword argument).
To help users keep their existing code working at least for this basic example, I propose another change: rename NWBFile to Session (a change we were thinking of making anyway) and keep NWBFile as an alias to Session with the old argument ordering. What do you think?
@rly I'm more concerned about backwards compatibility of data than conversion scripts. Would there be any change to the schema? If we rename NWBFile to Session I'm pretty sure we'll cause issues for data that is already written. If not, then there will be a pretty big difference between the API and the schema.
Hm. I see your point. Yes, it would be pretty confusing to have the API and schema refer to different names for the NWBFile/Session. Ok, never mind.
Proposed solution from @oruebel @ajtritt @bendichter : change the schema to make identifier optional, and do not change the API until the next major version of the NWB schema & PyNWB. For now, in PyNWB, if identifier is empty string, do not populate the identifier dataset.