hdmf
hdmf copied to clipboard
[Bug]: DTR validation assumes table is set and raises warning aggressively
What happened?
See https://github.com/catalystneuro/ndx-photometry/issues/8
Using HDMF 3.7.0, when a table T1 containing a DynamicTableRegion R is added to another container T2, HDMF checks whether DynamicTableRegion R references a table that shares an ancestor with T2 to warn the user that the referenced table hasn't been added to the file hierarchy.
This check assumes that R.table has been set, which is not always the case on initialization. An error is raised when it is not yet set. We should check for that case.
Also when T1 is added to T2, it is not always the case that R.table should have always been added to an ancestor of T2. The warning seems premature and aggressively requires a particular order of operations that I think is awkward. See my changes in https://github.com/NeurodataWithoutBorders/pynwb/pull/1778/files . For example, to create objects in PyNWB before adding them to an NWBFile, you have to do something like:
pm = ProcessingModule(...)
ps = create_plane_segmentation()
pm.add(ps)
ff = Fluorescence()
pm.add(ff)
rt_region = ps.create_roi_table_region(...)
rrs = RoiResponseSeries(...)
ff.add_roi_response_series(rrs)
instead of, which used to work without a warning and feels more natural.
pm = ProcessingModule(...)
ps = create_plane_segmentation()
rt_region = ps.create_roi_table_region(...)
rrs = RoiResponseSeries(...)
ff = Fluorescence(rrs) # create a Fluorescence obj and immediately add rrs - this now raises a warning
pm.add(ps)
pm.add(ff)
I think we should relax this check so that it only runs when adding objects to the top-level File object. To accommodate that, we could either create a new class for HdmfFile that another class like NWBFile can inherit from, or add a new class variable __is_root_type on Container that indicates whether the class represents a top-level File object that NWBFile and any other top-level class from an HDMF extension can set to True. Option 1 supports code reuse but there's no code to put in HdmfFile. Option 2 is more flexible and avoids edge cases that can occur when generating classes with multiple inheritance, so I prefer Option 2.
Steps to Reproduce
See https://github.com/catalystneuro/ndx-photometry/issues/8
Traceback
No response
Operating System
macOS
Python Executable
Conda
Python Version
3.11
Package Versions
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
- [X] Have you checked the Contributing document?
- [X] Have you ensured this bug was not already reported?
add a new class variable
__is_root_typeon Container that indicates whether the class represents a top-level File object thatNWBFile
In principle, any type can be a root, the only restriction, I believe is that the name most be root. As such, rather than __is_root_type on a class level, I would suggest to maybe just have a __is_root as an object property that is set in __init__. We could add an optional is_root argument on AbstractContainer that is false by default and if true, then we could also enforce that the name is set to "root", which would help avoid other issues as well. Alternatively, we could also just use the check if name == 'root', which I think would work too, but is a bit more hacky, because it doesn't allow for containers to be named root while not being the root and it introduces and implicit convention
check
if name == 'root'
That being said, this is the logic that at least ZarrIO is currently using:
https://github.com/hdmf-dev/hdmf-zarr/blob/70bf35b60ed2c2eaae7b12080bc1f4cc3d89ba3e/src/hdmf_zarr/backend.py#L545
However, having an explicit variable to indicate whether a container is the root would be useful. I don't think such a change to the API would affect many users, but it would likely affect a lot of tests in HDMF (and possibly in PyNWB).
I like the idea of passing it in to the constructor. That is effectively how LinkML uses their tree_root property and it makes sense.
Also, the root object doesn't have a name on disk. The name is set to "root" for the root container in memory on read. On write, the name of the root container is currently ignored and just happens to be set to "root" in the case of NWBFile.
That being said, this is the logic that at least ZarrIO is currently using:
So ZarrIO will not currently work correctly when writing/reading a DynamicTable or other Container named something else?
@oruebel At least in HDF5, the name of the root container is not written. Does Zarr write the root container name to disk?
At least in HDF5, the name of the root container is not written. Does Zarr write the root container name to disk?
Same thing in Zarr. But bott ZarrIO and HDF5IO define ROOT_NAME="root" as a variable to define that the root container must be names root