data icon indicating copy to clipboard operation
data copied to clipboard

Is our handling of open files safe?

Open pmeier opened this issue 2 years ago • 2 comments

Our current strategy is to wrap all file handles in a StreamWrapper. It dispatches all calls to wrapped object and adds a __del__ method:

class StreamWrapper:
    def __init__(self, file_obj):
        self.file_obj = file_obj

    def __del__(self):
        try:
            self.file_obj.close()
        except Exception:
            pass

It will be called as soon as there are no more references to instance. The rationale is that if this happens we can close the wrapped file object. Since the StreamWrapper has a reference to the file object, GC should never try to delete the file object before __del__ of the StreamWrapper is called. Thus, we should never delete an open file object.

Unfortunately, the reasoning above seems not to be correct. In some cases, it seems GC will delete the file object before the StreamWrapper is deleted. This will emit a warning which the torchvision test suite will turn into an error. This was discussed at length in pytorch/vision#5801 and includes minimum requirements to reproduce the issue. Still, there was no minimal reproduction outside of the test environment found. The issue was presumably fixed in pytorch/pytorch#76345, but was popping up again in https://github.com/pytorch/data/runs/6500848588#step:9:1977.

Thus, I think it is valid question to ask if our approach is safe at all. It would be a quite bad UX if a user gets a lot of unclosed file warnings although they used torchdata or in extension torchvision.datasets as documented.

pmeier avatar May 23 '22 10:05 pmeier

Just found a tool that might help us to identify what's wrong with StreamWrapper. https://refcycle.readthedocs.io/en/latest/

This tool can help us to visualize the refcount of nested objects.

ejguan avatar Jun 13 '22 14:06 ejguan

It will be called as soon as there are no more references to instance. The rationale is that if this happens we can close the wrapped file object. Since the StreamWrapper has a reference to the file object, GC should never try to delete the file object before __del__ of the StreamWrapper is called. Thus, we should never delete an open file object.

Not exactly what is happening inside the tests. It is quite opposite, the testing environment keeping additional ref counter to the open file handler. This makes StreamWrapper point to a non-deletable object and __del__ is never invoked to close the file. I have a stack of PRs to fix this issue. The last one of them is https://github.com/pytorch/vision/pull/6128

VitalyFedyunin avatar Jul 06 '22 18:07 VitalyFedyunin

I am closing this issue since I believe it has been fixed.

ejguan avatar Jan 05 '23 15:01 ejguan