anndata icon indicating copy to clipboard operation
anndata copied to clipboard

Unclosed HDF5 file handle

Open kaizhang opened this issue 2 years ago • 7 comments

The HDF5 file object is not explicitly closed here. In some cases, it causes the file to be locked even after adata.file.close(). It can be fixed by using the context manager.

https://github.com/theislab/anndata/blob/283b0c13b235b41a9c8fd0cb3c4712698858aa33/anndata/_io/h5ad.py#L134

kaizhang avatar Mar 05 '22 07:03 kaizhang

Could you provide an example of this bug? I'm not able to reproduce with this:

import anndata as ad, numpy as np
import h5py

mem = ad.AnnData(np.ones((5, 5)))
mem.write("tmp.h5ad")
backed = ad.read_h5ad("tmp.h5ad", backed="r+")
backed.file.close()

with h5py.File("tmp.h5ad", "r+") as f:
    f["X"][0] = 0

ivirshup avatar Mar 08 '22 11:03 ivirshup

This doesn't cause problems in general, as the file handle will be closed when it goes out of scope. However, with the new file spec feature, developers are able to customize read_elem, in which the file handle could potentially be used and cause it never closed (you can't explicitly close it later either as read_elem does not get the reference to the file handle itself).

Nevertheless, I think it is a good idea to explicitly close any file handles that are no longer needed.

https://github.com/theislab/anndata/blob/283b0c13b235b41a9c8fd0cb3c4712698858aa33/anndata/_io/h5ad.py#L146

kaizhang avatar Mar 08 '22 16:03 kaizhang

I'm not sure I'm getting your point.

The file isn't closed in read_h5ad_backed because we still want to have access to that file in backed mode. There's overhead to opening and closing the file, which we don't want to have for each subsequent access.

There is a lock which could be applied, but this is only if you're keeping write access.

ivirshup avatar Mar 08 '22 16:03 ivirshup

In the last line, you created a new file object using AnnData(**d) through the file manager. And I think you didn't give the ownership of the old file object f created in read_h5ad_backed to any other functions. So there is no way to call f.close() later. Correct me if I'm wrong.

So there is no data owns the f created in read_h5ad_backed. But read_elem could potentially own a reference to groups in f, which prevents the f from being automatically closed.

Giving the ownership of f to the file manager in AnnData will solve the problem as well. But right now the file manager is created by opening a new file rather than reusing the existing file object.

kaizhang avatar Mar 08 '22 16:03 kaizhang

I can devise an example later when I have time.

kaizhang avatar Mar 08 '22 17:03 kaizhang

Ah, I see what you mean now.

It would be reasonable to close this there then, but probably more reasonable to pass through the handle.

However, since the ability to overload read_elem isn't public yet hopefully there are minimal ill effects current. My hope is by the time we make that ability public we will have some better solutions for out of core access.

ivirshup avatar Mar 08 '22 17:03 ivirshup

Is there a chance this close could fix the https://github.com/scverse/anndata/issues/522 issue?

Hrovatin avatar Aug 25 '22 09:08 Hrovatin

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!

github-actions[bot] avatar Aug 15 '23 02:08 github-actions[bot]