anndata
anndata copied to clipboard
Unclosed HDF5 file handle
The HDF5 file object is not explicitly closed here. In some cases, it causes the file to be locked even after adata.file.close()
. It can be fixed by using the context manager.
https://github.com/theislab/anndata/blob/283b0c13b235b41a9c8fd0cb3c4712698858aa33/anndata/_io/h5ad.py#L134
Could you provide an example of this bug? I'm not able to reproduce with this:
import anndata as ad, numpy as np
import h5py
mem = ad.AnnData(np.ones((5, 5)))
mem.write("tmp.h5ad")
backed = ad.read_h5ad("tmp.h5ad", backed="r+")
backed.file.close()
with h5py.File("tmp.h5ad", "r+") as f:
f["X"][0] = 0
This doesn't cause problems in general, as the file handle will be closed when it goes out of scope. However, with the new file spec
feature, developers are able to customize read_elem
, in which the file handle could potentially be used and cause it never closed (you can't explicitly close it later either as read_elem
does not get the reference to the file handle itself).
Nevertheless, I think it is a good idea to explicitly close any file handles that are no longer needed.
https://github.com/theislab/anndata/blob/283b0c13b235b41a9c8fd0cb3c4712698858aa33/anndata/_io/h5ad.py#L146
I'm not sure I'm getting your point.
The file isn't closed in read_h5ad_backed
because we still want to have access to that file in backed mode. There's overhead to opening and closing the file, which we don't want to have for each subsequent access.
There is a lock which could be applied, but this is only if you're keeping write access.
In the last line, you created a new file object using AnnData(**d)
through the file manager. And I think you didn't give the ownership of the old file object f
created in read_h5ad_backed
to any other functions. So there is no way to call f.close()
later. Correct me if I'm wrong.
So there is no data owns the f
created in read_h5ad_backed
. But read_elem
could potentially own a reference to groups in f
, which prevents the f
from being automatically closed.
Giving the ownership of f
to the file manager in AnnData
will solve the problem as well. But right now the file manager is created by opening a new file rather than reusing the existing file object.
I can devise an example later when I have time.
Ah, I see what you mean now.
It would be reasonable to close this there then, but probably more reasonable to pass through the handle.
However, since the ability to overload read_elem
isn't public yet hopefully there are minimal ill effects current. My hope is by the time we make that ability public we will have some better solutions for out of core access.
Is there a chance this close could fix the https://github.com/scverse/anndata/issues/522 issue?
This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!