VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

File handle resource leak

Open chuckwondo opened this issue 10 months ago • 4 comments

In virtualizarr/readers/common.py, the function maybe_open_loadable_vars_and_indexes has a file resource leak when a non-empty list of loadable_variables is supplied. In this case, an xarray Dataset is opened, but never closed (when loadable_variables is non-empty), and the dataset is not returned by the function, so the caller has no way of closing it either.

This is visible in numerous warnings emitted during unit testing, of the following form: RuntimeWarning: deallocating CachingFileManager

Issue #390 points out these warnings as well, but many of them cannot be addressed without refactoring the maybe_open_loadable_vars_and_indexes function to enable proper closing of the currently hidden dataset.

chuckwondo avatar Mar 03 '25 23:03 chuckwondo

the dataset is not returned by the function, so the caller has no way of closing it either.

This is the problem - I wonder if we can set the ._close attribute on the Dataset that is returned by open_virtual_dataset such that it can be closed by the caller.

TomNicholas avatar Mar 03 '25 23:03 TomNicholas

the dataset is not returned by the function, so the caller has no way of closing it either.

This is the problem - I wonder if we can set the ._close attribute on the Dataset that is returned by open_virtual_dataset such that it can be closed by the caller.

I see that maybe_open_loadable_vars_and_indexes is called from the various implementations of open_virtual_dataset, and that each one then calls construct_virtual_dataset and returns that value.

Are you saying that we would modify maybe_open_loadable_vars_and_indexes to return the dataset it opens, and then set ._close on the dataset produced by construct_virtual_dataset to the dataset returned by maybe_open_loadable_vars_and_indexes before returning that dataset produced by construct_virtual_dataset?

Something along these lines?

...
ds, loadable_vars, indexes = maybe_open_loadable_vars_and_indexes(...)
vds = construct_virtual_dataset(...)

if ds is not None:
    vds._close = ds.close

return vds

Or perhaps we could also modify construct_virtual_dataset to accept the (possibly None) dataset returned by maybe_open_loadable_vars_and_indexes and set ._close there, which would avoid a bit of code duplication.

chuckwondo avatar Mar 04 '25 12:03 chuckwondo

Ah, I see the following comment inside construct_virtual_dataset:

# TODO we should probably also use vds.set_close() to tell xarray how to close the file we opened

chuckwondo avatar Mar 04 '25 12:03 chuckwondo

Yes, but this would be much easier to do after #473.

TomNicholas avatar Mar 04 '25 16:03 TomNicholas

I'm going to close this because we no longer have any deallocation warnings when running the tests. Please re-open if you see other issues.

maxrjones avatar Jul 13 '25 17:07 maxrjones