Do not allow write with a `DynamicTableRegion` index that is out of bounds
Do not allow writing of a file with a DynamicTableRegion index that is out of bounds. The file is unreadable. Attempting to read it results in a cryptic error message (#209).
I believe that check already exists right here:
https://github.com/hdmf-dev/hdmf/blob/55dc7dd2b51f4725bd818fb58eb8d811319279a2/src/hdmf/common/table.py#L692-L696
Anytime DynamicTableRegion data is set (i.e. read or write), this brute force check happen. For example, it took at least 10 minutes to open a file that had a DynamicTableRegion with more than 2.4M elements. So, I don't think we want to go this route for data integrity.
We found a number of files from the DataJoint team that had this error and were unopenable in pynwb. I'm not sure how they managed to create these files in the first place with this check in place. I'm pretty sure they were using pynwb to write. Was this check recently added?
As for files taking a long time to open, I think that is probably due to automatically dereferencing DynamicTableRegions during open, and I think we should reconsider doing that.
I think that is probably due to automatically dereferencing DynamicTableRegions during open
It’s reading one element at a time from the index—not the dereferenced data.
I think these checks need to move elsewhere, and shouldn’t be enforced in setters. It makes maintenance and tracking of these sort of rules difficult, and creates performance issues. We probably need some sort of constraints layer to enforce these rules.
We should do this for all DynamicTableRegions that are small enough (e.g., <10000 elements) and have that threshold (or the behavior) be configurable on write. It would be ideal to check this everytime a value is changed, but we cannot monitor all changes because we use numpy and h5py.dataset objects. A constraints layer would be nice, but for now, this can be done on write.