pynwb
pynwb copied to clipboard
Make NWBFile identifier optional
2) Feature Request
As per our discussion at the hackathon re: session_id
and identifier
of NWBFile
, identifier
should no longer be required. session_id
should be highly encouraged, but in order to maintain backwards compatibility, I think we should not make it required.
Is the idea to auto-generate identifier
when it is not provided?
An NWBFile
(and all other neurodata_types) now have practically unique object IDs, which can serve the purpose of identifying an NWBFile and its contents*. As such, I think we can make identifier
optional now and not auto-generate it.
Making identifier optional is fine. I think we may want to update the doc for the identifier field as well. Currently the doc discusses an example on how to construct the identifier. Instead, I think we should just say that is should be unique, and use DOI as an example.
Note that making identifier
optional in the API will break backwards compatibility of the API. i.e. calls such as:
nwbfile = NWBFile('description, 'id', start_time)
will no longer work. Because identifier
will have a default value and session_start_time
does not, the positional arguments of __init__
will be parsed as: session_description
, session_start_time
and then identifier
(if I just make identifier
the first keyword argument).
To help users keep their existing code working at least for this basic example, I propose another change: rename NWBFile
to Session
(a change we were thinking of making anyway) and keep NWBFile
as an alias to Session
with the old argument ordering. What do you think?
@rly I'm more concerned about backwards compatibility of data than conversion scripts. Would there be any change to the schema? If we rename NWBFile
to Session
I'm pretty sure we'll cause issues for data that is already written. If not, then there will be a pretty big difference between the API and the schema.
Hm. I see your point. Yes, it would be pretty confusing to have the API and schema refer to different names for the NWBFile/Session. Ok, never mind.
Proposed solution from @oruebel @ajtritt @bendichter : change the schema to make identifier
optional, and do not change the API until the next major version of the NWB schema & PyNWB. For now, in PyNWB, if identifier
is empty string, do not populate the identifier
dataset.