datatree
datatree copied to clipboard
Automatically close files using open_datatree context manager
In xarray it's possible to automatically close a dataset after opening by opening it using a context manager. From the documentation:
Datasets have a Dataset.close() method to close the associated netCDF file. However, it’s often cleaner to use a with statement:
# this automatically closes the dataset after use In [5]: with xr.open_dataset("saved_on_disk.nc") as ds: ...: print(ds.keys()) ...:
We currently don't have a DataTree.close()
method, or any context manager behaviour for open_datatree
. To add them presumably we would need to iterate over all file handles (i.e. groups) and close them one by one.
Related to #90 @jhamman @thewtex
Could there be a load_datatree()
method to be consistent with xr.load_dataset()
? xr.load_dataset()
Could there be a load_datatree() method
Sure, once we have a .load()
method too then writing a load_datatree()
function would be simple, just like the code for xr.load_dataset()
is simple.
Though currently we haven't implemented dask-specific methods yet.
@aurghs, @alexamici and @malmans2 - this issue and related backends issues seem like a good place for you guys to contribute if you wanted. You have expertise on xarray's backends, I don't, and they are pretty separable.
There are likely to be subtleties with respect to tracking multiple open file handles, and be aware that this will need to be done explicitly via a ._close
attribute on DataTree
after #41 moves that responsibility away from xarray.Dataset
.
Sure - we are on it!
Is there any update on this issue or context manager? The file becomes occupied after open_datatree
, which is so annoying.
I am also interested in having this fixed. Can we exploit the logic done in xr.open_mfdataset
? With collect the closers of all nodes, and then we assign a partial function to _close
like in here? Or would you prefer to design a multicloser class?