File handle resource leak
In virtualizarr/readers/common.py, the function maybe_open_loadable_vars_and_indexes has a file resource leak when a non-empty list of loadable_variables is supplied. In this case, an xarray Dataset is opened, but never closed (when loadable_variables is non-empty), and the dataset is not returned by the function, so the caller has no way of closing it either.
This is visible in numerous warnings emitted during unit testing, of the following form: RuntimeWarning: deallocating CachingFileManager
Issue #390 points out these warnings as well, but many of them cannot be addressed without refactoring the maybe_open_loadable_vars_and_indexes function to enable proper closing of the currently hidden dataset.
the dataset is not returned by the function, so the caller has no way of closing it either.
This is the problem - I wonder if we can set the ._close attribute on the Dataset that is returned by open_virtual_dataset such that it can be closed by the caller.
the dataset is not returned by the function, so the caller has no way of closing it either.
This is the problem - I wonder if we can set the
._closeattribute on theDatasetthat is returned byopen_virtual_datasetsuch that it can be closed by the caller.
I see that maybe_open_loadable_vars_and_indexes is called from the various implementations of open_virtual_dataset, and that each one then calls construct_virtual_dataset and returns that value.
Are you saying that we would modify maybe_open_loadable_vars_and_indexes to return the dataset it opens, and then set ._close on the dataset produced by construct_virtual_dataset to the dataset returned by maybe_open_loadable_vars_and_indexes before returning that dataset produced by construct_virtual_dataset?
Something along these lines?
...
ds, loadable_vars, indexes = maybe_open_loadable_vars_and_indexes(...)
vds = construct_virtual_dataset(...)
if ds is not None:
vds._close = ds.close
return vds
Or perhaps we could also modify construct_virtual_dataset to accept the (possibly None) dataset returned by maybe_open_loadable_vars_and_indexes and set ._close there, which would avoid a bit of code duplication.
Ah, I see the following comment inside construct_virtual_dataset:
# TODO we should probably also use vds.set_close() to tell xarray how to close the file we opened
Yes, but this would be much easier to do after #473.
I'm going to close this because we no longer have any deallocation warnings when running the tests. Please re-open if you see other issues.