hdmf icon indicating copy to clipboard operation
hdmf copied to clipboard

Do not allow write with a `DynamicTableRegion` index that is out of bounds

Open rly opened this issue 6 years ago • 4 comments

Do not allow writing of a file with a DynamicTableRegion index that is out of bounds. The file is unreadable. Attempting to read it results in a cryptic error message (#209).

rly avatar Nov 21 '19 21:11 rly

I believe that check already exists right here:

https://github.com/hdmf-dev/hdmf/blob/55dc7dd2b51f4725bd818fb58eb8d811319279a2/src/hdmf/common/table.py#L692-L696

Anytime DynamicTableRegion data is set (i.e. read or write), this brute force check happen. For example, it took at least 10 minutes to open a file that had a DynamicTableRegion with more than 2.4M elements. So, I don't think we want to go this route for data integrity.

ajtritt avatar Jan 02 '20 17:01 ajtritt

We found a number of files from the DataJoint team that had this error and were unopenable in pynwb. I'm not sure how they managed to create these files in the first place with this check in place. I'm pretty sure they were using pynwb to write. Was this check recently added?

As for files taking a long time to open, I think that is probably due to automatically dereferencing DynamicTableRegions during open, and I think we should reconsider doing that.

bendichter avatar Jan 02 '20 17:01 bendichter

I think that is probably due to automatically dereferencing DynamicTableRegions during open

It’s reading one element at a time from the index—not the dereferenced data.

I think these checks need to move elsewhere, and shouldn’t be enforced in setters. It makes maintenance and tracking of these sort of rules difficult, and creates performance issues. We probably need some sort of constraints layer to enforce these rules.

ajtritt avatar Jan 02 '20 18:01 ajtritt

We should do this for all DynamicTableRegions that are small enough (e.g., <10000 elements) and have that threshold (or the behavior) be configurable on write. It would be ideal to check this everytime a value is changed, but we cannot monitor all changes because we use numpy and h5py.dataset objects. A constraints layer would be nice, but for now, this can be done on write.

rly avatar Jan 05 '23 21:01 rly