Make `NWBFile` groups optional :)
Currently, many of the top-level items in NWBFile are explicitly optional (quantity is either * or ?), and several can be interpreted as being optional, but there is a little bit of inconsistency that is awkward here:
Here's the current status of quantity for NWBFile.groups:
- acquisition: null
- analysis: null
- general: null
- intervals: '?'
- processing: null
- scratch: '?'
- stimulus: null
- units: '?'
(where null == 1 by default).
Some of these can be interpreted as being optional, eg. groups like processing consist of only *-quantitied groups, so I am treating those as Optional[Dict[str, ProcessingModule]] : https://github.com/p2p-ld/nwb-linkml/blob/77a852913c0ae036ca7285a3e5e5dc04fa3e8954/nwb_models/src/nwb_models/models/pydantic/core/v2_7_0/core_nwb_file.py#L258
But others, in particular general and stimulus would be much harder to make that inference for, and it's enough of a special case that i'd prefer not to do it. For both of those, all of their (recursive) child groups are optional, so it makes sense to also make them optional.
This is already how pynwb behaves for the null-aka-1-quantitied groups:
acquisitionis explicitly optional: https://github.com/NeurodataWithoutBorders/pynwb/blob/e938202b257e44c5536dc35476d55aa98e57d18d/src/pynwb/file.py#L354analysisis explicitly optional: https://github.com/NeurodataWithoutBorders/pynwb/blob/e938202b257e44c5536dc35476d55aa98e57d18d/src/pynwb/file.py#L356pynwbcurrently does not representgeneralas a separate object and has all of its groups and subgroups unpacked in thedocvalsignature and maps them back intogeneralin theNWBFileMap, soNWBFile.generalis technically optional because it doesn't exist as a discrete type.processingis explicitly optional: https://github.com/NeurodataWithoutBorders/pynwb/blob/e938202b257e44c5536dc35476d55aa98e57d18d/src/pynwb/file.py#L373stimulusis explicitly optional: https://github.com/NeurodataWithoutBorders/pynwb/blob/e938202b257e44c5536dc35476d55aa98e57d18d/src/pynwb/file.py#L359
So it would be nice to make the rest of the groups quantity ? to make that behavior ~schema official~, because at the moment requiring general and stimulus is technically correct behavior, even if all of their params are optional (and i don't think there's a syntactic construct in nwb schema language for "the default value is an empty instance of this class").
acquisitionis explicitly optional: https://github.com/NeurodataWithoutBorders/pynwb/blob/e938202b257e44c5536dc35476d55aa98e57d18d/src/pynwb/file.py#L354
analysisis explicitly optional: https://github.com/NeurodataWithoutBorders/pynwb/blob/e938202b257e44c5536dc35476d55aa98e57d18d/src/pynwb/file.py#L356
processingis explicitly optional: https://github.com/NeurodataWithoutBorders/pynwb/blob/e938202b257e44c5536dc35476d55aa98e57d18d/src/pynwb/file.py#L373
stimulusis explicitly optional: https://github.com/NeurodataWithoutBorders/pynwb/blob/e938202b257e44c5536dc35476d55aa98e57d18d/src/pynwb/file.py#L359
Just to avoid confusion, I believe the parameters indicated here only mean that a user does not need to specify any contents to be stored in the groups. As far as I know, PyNWB still treats these groups as required and will generate empty groups if the user does not specify any contents. I.e., as far as I know, PyNWB (and MatNWB) indeed treat these groups as required.
pynwbcurrently does not representgeneralas a separate object and has all of its groups and subgroups unpacked in thedocvalsignature and maps them back intogeneralin theNWBFileMap, soNWBFile.generalis technically optional because it doesn't exist as a discrete type.
/general is being treated as required by PyNWB. /general is a required component (i.e,. with a quantity of null aka 1) because it is part of the NWBFile type. The group does not need to exist as its own type for it to be required.
Some of these can be interpreted as being optional, e.g. groups like
processingconsist of only*-quantitied groups, so I am treating those asOptional[Dict[str, ProcessingModule]]
acquisition, analysis, general, processing, and stimulus were all part of the original NWB 1.0 schema and the requirement for these groups to exist had carried over from there. intervals, units, and scratch were added later sometime during the development of NWB 2.x so they were added as optional groups. But that's just for historical background.
In general, I think it is reasonable to make groups that have no required contents optional. general/ contains several required metadata fields, so I believe it should remain as required. analysis and processing I believe could be optional. acquisition and stimulus I believe could also be made optional (but I'm not 100% sure right now for those two). In principle, I think it would be Ok to make some of the top-level groups that don't have required contents optional. I am not sure, however, how involved the changes in PyNWB would be to support this. Also, I suspect that some external tools may rely on the fact that these groups are currently required (but I'm not sure). To make a long story short, I think it would be Ok to explore the option to make some the groups optional, but I suspect that the coding effort required will likely not be as trivial as simply changing the quantity in the schema. This is a balance of allowing flexibility in the file structure vs. using a strict top-level hierarchy that others can depend on without having to check if the group (e.g,. /acquisition) actually exists.
@rly @bendichter thoughts?
I believe the parameters indicated here only mean that a user does not need to specify any contents to be stored in the groups
gotcha, makes sense. So I suppose there is some useful ux slippage between the schema and api - formally required by the schema/in the serialized file, but optional in the api with default to create a group during serialization. Reasonable call since a group doesn't have default_value and default_value i think only takes literals (?) so there isn't really such a thing as "this is required but when one isn't provided use an empty instance of this group" in the schema.
i am pretty sure (?) this is the only place where this happens, or at least where it's more complicated to to figure out which groups are technically-required-but-all-contents-are-optional because it has the deepest nesting. So if schema and serialization are more tightly bound than schema and API, i'm happy to just mimic that behavior and add this to my special cases handler in the generator by using blank SubModels() as defaults.
if this would be a complicating rather than a simplifying change, feel free to close it - it's also nice to be "given permission" to follow the pynwb behavior here, i'm trying to keep references in my docstrings to these explanations so people can see explanations for behaviors linked to each special case. trying to be useful with some passive documentation for yall :).
general/ contains several required metadata fields,
although now that you say this, i am looking at the schema and not seeing any required params in NWBFile.general and i'm worried i'm missing something (again) about the schema language...
although now that you say this, i am looking at the schema and not seeing any required params in
NWBFile.generaland i'm worried i'm missing something (again) about the schema language...
I don't think you are missing anything here. I thought some fields in /general were required (at least DANDI requires some, e.g., subject and species). Even without additional requirements from DANDI, in practice /general usually always has some metadata fields populated because, e.g., to store an ElectricalSeries one must have and ElectrodesTable which lives in /general. Similarly for ophys and icephys. I.e., while the fields in /general appear to be optional, /general in practice always has at least some metadata in it because they are being referenced (and hence needed) by other objects in the file.
but optional in the api with default to create a group during serialization
Since the top-level groups do not have their own type, they are represented in the API by NWBFile, so they don't need a separate instance of an object to represent them.
where it's more complicated to to figure out which groups are technically-required-but-all-contents-are-optional
If the quantity for a group is 1, then it is required by the schema (independent of whether everything else in the group may be optional).
if this would be a complicating rather than a simplifying change, feel free to close it
I'd like to hear from others what their thoughts are on making the top-level groups optional. I'm not opposed to the change, but I'm not sure right now whether this may open a hornets nest of downstream issues. I think the main complicating factor is probably the flexibility this adds, e.g., I'm not sure if/which tools may be relying on the fact that these groups exist (even if they are empty). @bendichter, @rly what do you think?
If the quantity for a group is 1, then it is required by the schema (independent of whether everything else in the group may be optional).
right right, just an API implementation detail like how pynwb does it. i think the file object is the only place where this needs special handling, so i'll just document that.
I'd be curious too - wonder how many downstream impls directly interact with the HDF5 file rather than through pynwb
I'd be in favor of with making top-level groups like aquisition and processing optional. I don't like having empty groups. It seems like it should be a simple change, but it may be trickier than it seems.
I agree with @bendichter . Having empty groups is confusing and inelegant, particularly when browsing NWBFile in an HDF5 viewer or neurosift. /stimulus/presentation gets me every time because stimulus is not empty but all of its child groups usually are. That can be addressed via the visualization tool, but it seems unnecessary. I see @oruebel 's point that some downstream tools may look for these groups outside of pynwb. I think we should make the change but in schema 2.9.0 (not the one coming up this month) so we have some more time to test it and also look at it in matlab.
Related:
https://github.com/NeurodataWithoutBorders/nwb-schema/issues/573
ah, my bad - feel free to close this since it looks like a dupe of #573