pynwb icon indicating copy to clipboard operation
pynwb copied to clipboard

[Bug]: Invalid dataset identifier

Open rcpeene opened this issue 1 year ago • 4 comments

What happened?

Accessing critical data fields in some NWB Files fails and yields "Invalid Dataset Identifier" when trying to access any elements of a timeseries. Printing the timeseries shows "closed dataset identifier".

Context: We have NWB files packaged on one machine and uploaded to DANDI. Then I downloaded one of the files and attempted to use my code to open it and it fails in the way described above. The same code works fine to open other NWB files with similar data inside. Seems related to issue #1666, where the io object is garbage collected, and data arrays are lost. For reasons relating to my project, the workaround mentioned there is impractical, and the function dandi_download_open which contains my NWBHDF5IO and io.read lines belong in a different file.

Must also be related to some differeing dependencies, but I struggle to identify in exactly what way. Seeing as other similar NWB files that were packaged with different versions of PyNWB, HDMF, and h5py are opened just fine on my machine with the same code. Perhaps just an issue with my version of hdmf being > 3.5.0? When I revert my version to hdmf==3.4.2, this error does not occur. I'd like to be able to open valid files without having to stick with older versions of PyNWB and the like.

Steps to Reproduce

Run the following code in a Jupyter notebook

# downloads an NWB file from DANDI to download_loc, opens it, and returns the NWB object
# dandi_api_key is required to access files from embargoed dandisets
def dandi_download_open(dandiset_id, dandi_filepath, download_loc, dandi_api_key=None, force_overwrite=False):
    client = dandiapi.DandiAPIClient(token=dandi_api_key)
    dandiset = client.get_dandiset(dandiset_id)

    file = dandiset.get_asset_by_path(dandi_filepath)
    file_url = file.download_url

    filename = dandi_filepath.split("/")[-1]
    filepath = f"{download_loc}/{filename}"

    if os.path.exists(filepath) and not force_overwrite:
        print("File already exists")
    else:
        download.download(file_url, output_dir=download_loc)
        print(f"Downloaded file to {filepath}")

    print("Opening file")
    io = NWBHDF5IO(filepath, mode="r", load_namespaces=True)
    nwb = io.read()
    return nwb
dandiset_id = "000535"
dandi_filepath = "sub-460654/sub-460654_ses-20190611T181840_behavior+ophys.nwb"
nwb = dandi_download_open(dandiset_id, dandi_filepath, download_loc, dandi_api_key=dandi_api_key)
print(nwb.processing["ophys"]["DfOverF"].roi_response_series["RoiResponseSeries"].data[0])

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[37], line 1
----> 1 nwb.processing["behavior"].data_interfaces["BehavioralTimeSeries"].time_series["running_velocity"].data[:]

File h5py\_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py\_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\h5py\_hl\dataset.py:756, in Dataset.__getitem__(self, args, new_dtype)
    744 """ Read a slice from the HDF5 dataset.
    745 
    746 Takes slices and recarray-style field names (more than one is
   (...)
    752 * Boolean "mask" array indexing
    753 """
    754 args = args if isinstance(args, tuple) else (args,)
--> 756 if self._fast_read_ok and (new_dtype is None):
    757     try:
    758         return self._fast_reader.read(args)

File c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\h5py\_hl\base.py:536, in cached_property.__get__(self, obj, cls)
    533 if obj is None:
    534     return self
--> 536 value = obj.__dict__[self.func.__name__] = self.func(obj)
...
File h5py\_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py\h5d.pyx:350, in h5py.h5d.DatasetID.get_space()

ValueError: Invalid dataset identifier (invalid dataset identifier)

Operating System

Windows

Python Executable

Python

Python Version

3.9

Package Versions

Environment of the machine that generated the problem files:

jerome.lecoq@OSXLTCYGQCV ~ % pip freeze
aiohttp==3.7.4
anyio==3.6.2
appdirs==1.4.4
appnope @ file:///Users/ktietz/ci_310/appnope_1643965056645/work
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
argschema==3.0.4
arrow==1.2.3
asciitree==0.3.3
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
async-timeout==3.0.1
attrs==22.1.0
backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work
beautifulsoup4==4.11.1
bidsschematools==0.6.0
bleach==5.0.1
blessings==1.7
blosc2==2.0.0
boto3==1.26.117
botocore==1.29.117
bqplot==0.12.36
cachetools==4.2.4
ccfwidget==0.1.0
certifi==2023.5.7
cffi==1.15.1
chardet==3.0.4
charset-normalizer==2.1.1
ci-info==0.3.0
click==8.1.3
click-didyoumean==0.3.0
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1666700638685/work
comm @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_b19kb7be6_/croot/comm_1671231124262/work
contourpy==1.0.6
cycler==0.11.0
Cython==0.29.32
dandi==0.54.0
dandischema==0.8.2
debugpy @ file:///Users/ktietz/ci_310/debugpy_1643965577625/work
decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work
defusedxml==0.7.1
distro==1.8.0
dnspython==2.2.1
email-validator==1.3.0
entrypoints @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_croot-jb01gaox/entrypoints_1650293758411/work
etelemetry==0.3.0
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1678703645500/work
executing @ file:///opt/conda/conda-bld/executing_1646925071911/work
fastapi==0.95.0
fasteners==0.18
fastjsonschema==2.16.2
fonttools==4.38.0
fqdn==1.5.1
fscacher==0.3.0
fsspec==2022.11.0
future==0.18.2
gast==0.4.0
Glymur==0.8.19
h11==0.14.0
h5py==3.8.0
hdbscan==0.8.29
hdmf==3.5.5
humanize==4.4.0
idna==3.4
imageio==2.23.0
importlib-metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1682176699712/work
iniconfig @ file:///home/conda/feedstock_root/build_artifacts/iniconfig_1673103042956/work
interleave==0.2.1
ipydatagrid==1.1.14
ipydatawidgets==4.3.2
ipykernel @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_bappucl7zp/croot/ipykernel_1671488382153/work
ipympl==0.9.2
ipython @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_6evw3wmnra/croot/ipython_1670919318109/work
ipython-genutils==0.2.0
ipyvolume==0.6.0a10
ipyvue==1.8.0
ipyvuetify==1.8.4
ipywebrtc==0.6.0
ipywidgets==7.7.2
isodate==0.6.1
isoduration==20.11.0
jaraco.classes==3.2.3
jedi @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/croot-f1t6hma6/jedi_1644315882177/work
Jinja2==3.1.2
jmespath==0.10.0
joblib==1.2.0
jsonpointer==2.3
jsonschema==4.17.3
jupyter-events==0.5.0
jupyter_client @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_b6tvppu00c/croot/jupyter_client_1671703056848/work
jupyter_core @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_aa_owuo5e_/croot/jupyter_core_1672332232507/work
jupyter_server==2.0.6
jupyter_server_terminals==0.4.3
jupyterlab-pygments==0.2.2
jupyterlab-widgets==1.1.1
keyring==23.13.1
keyrings.alt==4.2.0
kiwisolver==1.4.4
line-profiler==4.0.3
llvmlite==0.39.1
MarkupSafe==2.1.1
marshmallow==3.19.0
matplotlib==3.4.2
matplotlib-inline @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_f6fdc0hldi/croots/recipe/matplotlib-inline_1662014472341/work
mistune==2.0.4
more-itertools==9.0.0
msgpack==1.0.4
multidict==6.0.4
natsort==8.2.0
nbclassic==0.4.8
nbclient==0.7.2
nbconvert==7.2.7
nbformat==5.7.1
ndx-events==0.2.0
ndx-grayscalevolume==0.0.2
ndx-icephys-meta==0.1.0
ndx-spectrum==0.2.2
nest-asyncio @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_6b_e0dr4lw/croot/nest-asyncio_1672387130036/work
networkx==2.8.8
notebook==6.5.2
notebook_shim==0.2.2
numba==0.56.4
numcodecs==0.11.0
numexpr==2.8.4
numpy==1.23.5
nwbinspector==0.4.27
nwbwidgets==0.10.0
packaging @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_952b3b8pj8/croot/packaging_1671697425767/work
pandas==1.5.2
pandocfilters==1.5.0
parso @ file:///opt/conda/conda-bld/parso_1641458642106/work
patsy==0.5.3
pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work
pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work
Pillow==9.3.0
platformdirs @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_f7wx6m2jsp/croots/recipe/platformdirs_1662711384790/work
plotly==5.11.0
pluggy @ file:///home/conda/feedstock_root/build_artifacts/pluggy_1667232663820/work
prometheus-client==0.15.0
prompt-toolkit @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_0blbsngvis/croot/prompt-toolkit_1672387317724/work
psutil @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_1310b568-21f4-4cb0-b0e3-2f3d31e39728k9coaga5/croots/recipe/psutil_1656431280844/work
psycopg2-binary==2.9.5
ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work
py-cpuinfo==9.0.0
py2vega==0.6.1
pycparser==2.21
pycryptodomex==3.16.0
pydantic==1.10.4
Pygments @ file:///opt/conda/conda-bld/pygments_1644249106324/work
pynndescent==0.5.8
pynrrd==0.4.3
pynwb==2.3.2
pyout==0.7.2
pyparsing==3.0.9
pyrsistent==0.19.2
pytest==7.3.1
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
python-json-logger==2.0.4
pythreejs==2.4.1
pytz==2022.6
PyWavelets==1.4.1
PyYAML==6.0
pyzmq @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_a7d9jpero7/croot/pyzmq_1682697648735/work
requests==2.28.1
requests-toolbelt==0.10.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rfc3987==1.3.8
ruamel.yaml==0.17.21
ruamel.yaml.clib==0.2.7
s3transfer==0.6.0
scikit-build==0.16.4
scikit-image==0.19.3
scikit-learn==1.2.0
scipy==1.9.3
seaborn==0.12.1
semantic-version==2.10.0
semver==2.13.0
Send2Trash==1.8.0
SimpleITK==2.2.1
simplejson==3.18.0
six @ file:///tmp/build/80754af9/six_1644875935023/work
sniffio==1.3.0
soupsieve==2.3.2.post1
SQLAlchemy==1.4.45
stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work
starlette==0.26.1
statsmodels==0.13.0
tables==3.8.0
tenacity==8.1.0
terminado==0.17.1
threadpoolctl==3.1.0
tifffile==2022.10.10
tinycss2==1.2.1
tomli @ file:///home/conda/feedstock_root/build_artifacts/tomli_1644342247877/work
tornado @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_a61b4xoie9/croots/recipe/tornado_1662061692951/work
tqdm==4.64.1
traitlets @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_6301rd5qbe/croot/traitlets_1671143894285/work
traittypes==0.2.1
trimesh==3.17.1
typing_extensions==4.4.0
umap==0.1.1
umap-learn==0.5.3
uri-template==1.2.0
urllib3==1.26.13
uvicorn==0.21.1
wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work
webcolors==1.12
webencodings==0.5.1
websocket-client==1.4.2
widgetsnbextension==3.6.1
xarray==2022.12.0
yarl==1.8.2
zarr==2.13.3
zarr-checksum==0.2.8
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1677313463193/work

Environment of my machine

accessible-pygments==0.0.4
aiohttp==3.8.3
aiosignal==1.3.1
alabaster==0.7.12
anyio==3.6.2
appdirs==1.4.4
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
argschema==2.0.2
arrow==1.2.3
asciitree==0.3.3
asttokens==2.2.0
async-timeout==4.0.2
attrs==21.4.0
Babel==2.10.3
backcall==0.2.0
beautifulsoup4==4.11.1
bg-atlasapi==1.0.2
bg-space==0.6.0
bidsschematools==0.6.0
bleach==5.0.1
boto3==1.17.21
botocore==1.20.112
bqplot==0.12.36
brainrender==2.0.5.5
bs4==0.0.1
cachetools==4.2.4
ccfwidget==0.5.3
certifi==2022.9.24
cffi==1.15.1
chardet==3.0.4
charset-normalizer==2.1.1
ci-info==0.3.0
click==8.1.3
click-didyoumean==0.3.0
cloudpickle==2.2.0
colorama==0.4.6
colorcet==3.0.1
commonmark==0.9.1
contourpy==1.0.6
coverage==7.2.1
cycler==0.11.0
dandi==0.46.6
dandischema==0.7.1
dask==2022.11.1
databook-utils @ file:///C:/Users/carter.peene/Desktop/Projects/openscope_databook
debugpy==1.6.4
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.13
distro==1.8.0
dnspython==2.2.1
docutils==0.17.1
elephant==0.12.0
email-validator==1.3.0
entrypoints==0.4
etelemetry==0.3.0
exceptiongroup==1.1.0
execnet==1.9.0
executing==1.2.0
fasteners==0.18
fastjsonschema==2.16.2
fonttools==4.38.0
fqdn==1.5.1
frozenlist==1.3.3
fscacher==0.2.0
fsspec==2022.11.0
future==0.18.2
gast==0.4.0
gitdb==4.0.9
GitPython==3.1.27
Glymur==0.8.19
google==3.0.0
greenlet==1.1.3
h5py==3.9.0
hdmf==3.6.1
humanize==4.4.0
idna==3.4
imagecodecs==2022.9.26
imageio==2.22.4
imagesize==1.4.1
importlib-metadata==4.13.0
importlib-resources==5.10.0
iniconfig==2.0.0
interleave==0.2.1
ipycanvas==0.13.1
ipydatagrid==1.1.14
ipydatawidgets==4.3.2
ipyevents==2.0.1
ipykernel==6.17.1
ipympl==0.9.2
ipysheet==0.5.0
ipython==8.7.0
ipython-genutils==0.2.0
ipytree==0.2.2
ipyvolume==0.6.0a10
ipyvtklink==0.2.3
ipyvue==1.8.0
ipyvuetify==1.8.4
ipywebrtc==0.6.0
ipywidgets==7.7.2
isoduration==20.11.0
itk-core==5.3.0
itk-filtering==5.3.0
itk-meshtopolydata==0.10.0
itk-numerics==5.3.0
itkwidgets==0.32.4
jaraco.classes==3.2.3
jedi==0.18.2
Jinja2==3.1.2
JIT==0.0.1
jmespath==0.10.0
joblib==1.2.0
jsonpointer==2.3
jsonschema==3.2.0
jupyter==1.0.0
jupyter-book==0.13.0
jupyter-cache==0.4.3
jupyter-console==6.4.4
jupyter-server==1.23.3
jupyter-server-mathjax==0.2.6
jupyter-sphinx==0.3.2
jupyter_client==7.4.7
jupyter_core==5.1.0
jupyterlab-pygments==0.2.2
jupyterlab-widgets==1.1.1
K3D==2.7.4
keyring==23.11.0
keyrings.alt==4.2.0
kiwisolver==1.4.4
latexcodec==2.0.1
linkify-it-py==1.0.3
llvmlite==0.39.1
locket==1.0.0
loguru==0.6.0
lxml==4.9.1
markdown-it-py==1.1.0
MarkupSafe==2.1.1
marshmallow==3.0.0rc6
matplotlib==3.6.2
matplotlib-inline==0.1.6
matplotlib-venn==0.11.9
mdit-py-plugins==0.2.8
meshio==5.3.4
mistune==0.8.4
more-itertools==9.0.0
morphapi==0.1.7
MorphIO==3.3.3
mpl-interactions==0.22.0
msgpack==1.0.4
multidict==6.0.2
myst-nb==0.13.2
myst-parser==0.15.2
myterial==1.2.1
natsort==8.2.0
nbclassic==0.4.8
nbclient==0.5.13
nbconvert==6.5.4
nbdime==3.1.1
nbformat==5.7.0
nbmake==1.3.5
ndx-events==0.2.0
ndx-grayscalevolume==0.0.2
ndx-icephys-meta==0.1.0
ndx-spectrum==0.2.2
neo==0.12.0
nest-asyncio==1.5.6
networkx==2.8.8
neurom==3.2.2
notebook==6.5.2
notebook_shim==0.2.2
numba==0.56.4
numcodecs==0.10.2
numexpr==2.8.3
numpy==1.22.4
nwbinspector==0.4.20
nwbwidgets==0.10.0
opencv-python==4.6.0.66
ophys-nway-matching @ git+https://github.com/AllenInstitute/ophys_nway_matching@545504ab55922717ab623f8ede2c521a60aa1458
packaging==21.3
pandas==1.5.2
pandocfilters==1.5.0
param==1.12.2
parso==0.8.3
partd==1.3.0
patsy==0.5.3
pickleshare==0.7.5
Pillow==9.3.0
pkgutil_resolve_name==1.3.10
platformdirs==2.5.4
plotly==5.11.0
pluggy==1.0.0
prometheus-client==0.15.0
prompt-toolkit==3.0.33
psutil==5.9.4
psycopg2-binary==2.9.5
pure-eval==0.2.2
py==1.11.0
py2vega==0.6.1
pybtex==0.24.0
pybtex-docutils==1.0.2
pycparser==2.21
pycryptodomex==3.16.0
pyct==0.4.8
pydantic==1.10.2
pydata-sphinx-theme==0.8.1
Pygments==2.13.0
pyinspect==0.1.0
pynrrd==0.4.3
pynwb==2.3.2
pyout==0.7.2
pyparsing==3.0.9
PyPDF2==3.0.1
pyrsistent==0.19.2
pytest==7.2.1
pytest-cov==4.0.0
pytest-xdist==3.2.1
python-dateutil==2.8.2
pythreejs==2.4.1
pytz==2022.6
PyWavelets==1.4.1
pywin32==306
pywin32-ctypes==0.2.0
pywinpty==2.0.10
PyYAML==6.0
pyzmq==24.0.1
qtconsole==5.4.0
QtPy==2.3.0
quantities==0.14.1
requests==2.28.1
requests-toolbelt==0.10.1
retry==0.9.2
rfc3339-validator==0.1.4
rfc3987==1.3.8
rich==12.6.0
ruamel.yaml==0.17.21
ruamel.yaml.clib==0.2.7
s3transfer==0.3.7
scikit-build==0.16.4
scikit-image==0.19.3
scikit-learn==1.1.2
scipy==1.9.3
seaborn==0.12.1
semantic-version==2.10.0
semver==2.13.0
Send2Trash==1.8.0
SimpleITK==2.2.1
simplejson==3.18.0
six==1.16.0
smmap==5.0.0
sniffio==1.3.0
snowballstemmer==2.2.0
soupsieve==2.3.2.post1
Sphinx==4.5.0
sphinx-book-theme==0.3.3
sphinx-comments==0.0.3
sphinx-copybutton==0.5.0
sphinx-external-toc==0.2.4
sphinx-jupyterbook-latex==0.4.7
sphinx-multitoc-numbering==0.1.3
sphinx-thebe==0.1.2
sphinx-togglebutton==0.3.2
sphinx_design==0.1.0
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-bibtex==2.5.0
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
SQLAlchemy==1.4.41
stack-data==0.6.2
statsmodels==0.13.0
strict-rfc3339==0.7
tables==3.7.0
tabulate==0.9.0
tenacity==8.1.0
terminado==0.17.0
threadpoolctl==3.1.0
tifffile==2022.10.10
tinycss2==1.2.1
tomli==2.0.1
toolz==0.12.0
tornado==6.2
tqdm==4.64.1
traitlets==5.6.0
traittypes==0.2.1
treelib==1.6.1
trimesh==3.16.4
typing_extensions==4.4.0
uc-micro-py==1.0.1
uri-template==1.2.0
urllib3==1.26.13
util-colleenjg==0.0.1
vedo==2021.0.5
vtk==9.2.2
wcwidth==0.2.5
webcolors==1.12
webencodings==0.5.1
websocket-client==1.4.2
widgetsnbextension==3.6.1
win32-setctime==1.1.0
wrapt==1.14.1
wslink==1.8.4
xarray==2022.11.0
yarl==1.8.1
zarr==2.13.3
zipp==3.11.0
zstandard==0.19.0

Code of Conduct

rcpeene avatar Jun 26 '23 22:06 rcpeene

@rcpeene thanks for the bug report. What did you import in your example code? In the future, it would be really helpful for us to be able to open a blank jupyter notebook, and copy your code in without modification to run it.

I made a slight modification to run the code (not sure where download comes from) and was able to reproduce the error in dandi hub. I think the error comes from the fact that after python execution exits your function, all variables defined in the function that are not returned are deleted and garbage collected. This closes the file. An alternative approach is to pass around the open NWBHDF5IO object between function calls and call io.read() when you want to read it. This works for me. Would that work for your use case?

def dandi_download_open(dandiset_id, dandi_filepath, download_loc, dandi_api_key=None, force_overwrite=False):
    # <snip>
    io = NWBHDF5IO(filepath, mode="r", load_namespaces=True)
    return io

# <snip>
nwb = io.read()
print(nwb.processing["ophys"]["DfOverF"].roi_response_series["RoiResponseSeries"].data)

rly avatar Jun 27 '23 04:06 rly

My apolologies. The import code looks as follows.

import h5py
import os

from dandi import download
from dandi import dandiapi
from fsspec.implementations.cached import CachingFileSystem
from fsspec import filesystem
from pynwb import NWBHDF5IO

Your solution might suffice. The only legitimate reason I have for not returning IO is following good modularity/encapsulation practices. For the time being, I may prefer just keeping my dependencies in their older versions. Any ideas as to why this occurs only with NWB files generated with HDMF >= 3.5 ?

rcpeene avatar Jun 27 '23 17:06 rcpeene

This issue is also blocking the allensdk from adopting hdmf>=3.5.0. https://github.com/AllenInstitute/AllenSDK/blob/master/allensdk/brain_observatory/nwb/nwb_api.py#L28

The cause is the change here https://github.com/hdmf-dev/hdmf/pull/811 where HDMF now closes the file when the IO object gets deleted (e.g., when the scope of an IO object defined within the scope ends). I believe this was handled to better handle HDF5/Zarr compatibility. @oruebel do you remember why we added this? (in general, I think it is good to auto-close the file when the IO object is gone, but some codes rely on the fact that the file is still open until execution ends)

rly avatar Jul 26 '23 03:07 rly

If I understand the issue correctly, the problem is that the NWBHDF5IO object is being deleted (here because it is being created in the scope of a function) but you want to be able to keep using the NWBFile object that was read with that deleted NWBHDF5IO object. Before diving into some of the more details below, the short answer is that I believe https://github.com/hdmf-dev/hdmf/pull/882, which was included in the HDMF 3.7 release, should enable this use pattern and fix the issue you are describing.

do you remember why we added this

Not closing files correctly leads to a few different issues in practice, e.g.,: 1) we can't delete files (e.g., with os.remove) because the OS (especially Windows) will block the operation as permission denied if the file has not been closed properly. This comes up in particular in unit tests where we need to delete files created by tests. 2) it can prevent reopening a file with different settings; e.g., HDF5 does not allow opening a file simultaneously with incompatible access modes, e.g., if the file was first opened with 'r' mode and then you want to open it with 'a' mode, then this will fail if the file that was opened in 'r' mode was not closed correctly. In the context of Zarr, not closing the backend comes up in particular when trying to integrate some of the database backends (e.g, SQL), which are very sensitive to database connections being closed correctly.

in general, I think it is good to auto-close the file when the IO object is gone

Yes, any file that is being opened should always be closed explicitly. In the context of HDMF, files are ultimately opened and owned by the HDMFIO object (here the NWBHDF5IO object).

but some codes rely on the fact that the file is still open until execution ends

The issue here is really that the code needs to ensure that the HDMFIO object that is used for reading does not get deleted (and remains open) as long as the NWBFile is being used. Both the IO and the NWBFile objects, are created and owned by the user code, and so this means that it is possible that a user code can accidentally delete the io object (here by not returning the io object) before the NWBFile is being deleted. To prevent this from happening, HDMF#882 added the AbstractContainer.read_io property such that the io object used for reading is cached on the container (i.e., the NWBFile) object. This accomplishes two things: 1) the NWBFile keeps a reference of the io object and as such the Python garbage collector won't (accidentally) delete the io object while the NWBFile object is still being used, 2) you can access the io object directly from the NWBFile so that even though your dandi_download_open function explicitly only returns the NWBFile object the NWBHDF5IO object is now also being returned as part of the nwb.read_io property of the NWBFile. That is, this approach addresses the issue where Python garbage collects the io object because the user code may not explicitly keep a reference to both the NWBFile and the NWBHDF5IO object, which I believe is the issue you are describing.

oruebel avatar Jul 26 '23 05:07 oruebel