datatree icon indicating copy to clipboard operation
datatree copied to clipboard

Automatically close files using open_datatree context manager

Open TomNicholas opened this issue 2 years ago • 6 comments

In xarray it's possible to automatically close a dataset after opening by opening it using a context manager. From the documentation:

Datasets have a Dataset.close() method to close the associated netCDF file. However, it’s often cleaner to use a with statement:

# this automatically closes the dataset after use
In [5]: with xr.open_dataset("saved_on_disk.nc") as ds:
   ...:     print(ds.keys())
   ...: 

We currently don't have a DataTree.close() method, or any context manager behaviour for open_datatree. To add them presumably we would need to iterate over all file handles (i.e. groups) and close them one by one.

Related to #90 @jhamman @thewtex

TomNicholas avatar May 18 '22 15:05 TomNicholas

Could there be a load_datatree() method to be consistent with xr.load_dataset()? xr.load_dataset()

jrmagers avatar May 19 '22 03:05 jrmagers

Could there be a load_datatree() method

Sure, once we have a .load() method too then writing a load_datatree() function would be simple, just like the code for xr.load_dataset() is simple.

Though currently we haven't implemented dask-specific methods yet.

TomNicholas avatar May 19 '22 15:05 TomNicholas

@aurghs, @alexamici and @malmans2 - this issue and related backends issues seem like a good place for you guys to contribute if you wanted. You have expertise on xarray's backends, I don't, and they are pretty separable.

There are likely to be subtleties with respect to tracking multiple open file handles, and be aware that this will need to be done explicitly via a ._close attribute on DataTree after #41 moves that responsibility away from xarray.Dataset.

TomNicholas avatar May 25 '22 15:05 TomNicholas

Sure - we are on it!

malmans2 avatar May 26 '22 07:05 malmans2

Is there any update on this issue or context manager? The file becomes occupied after open_datatree, which is so annoying.

wohenbushuang avatar Jun 15 '23 14:06 wohenbushuang

I am also interested in having this fixed. Can we exploit the logic done in xr.open_mfdataset? With collect the closers of all nodes, and then we assign a partial function to _close like in here? Or would you prefer to design a multicloser class?

ghiggi avatar Jul 31 '23 11:07 ghiggi