Albert Zeyer
Albert Zeyer
I was playing around with iterating through all alive objects at the end, and that also triggers the crash. Sth like this: ```python print("**** remaining objects:") import gc for obj...
With python3-dbg some more: ``` Thread 1 "python3.10" received signal SIGSEGV, Segmentation fault. 0x000055555567fb51 in _PyObject_IS_GC (obj=) at ../Include/internal/pycore_object.h:166 166 ../Include/internal/pycore_object.h: No such file or directory. (gdb) bt #0 0x000055555567fb51...
With: ```python print("**** remaining objects:") import gc for obj in gc.get_objects(): print("0x%x" % id(obj), type(obj), obj) print("**** done.") ``` Another variant of the crash: ``` Thread 1 "python3.10" received signal...
Note, this object you see here in `object_str`, that looks very much like the `__dict__` of a `saved_tensors_hooks` instance, which has `pack_hook` and `unpack_hook`.
Ok, I added this `print` in `gradient_checkpoint_scope.__init__` to print the address of the `pack_hook` method: ```python def __init__(self): self.record_graph_scope = _RecordGraph() self.record_graph_scope.graph.gradient_checkpoint_scope_backref = self # Note: saved_tensors_hooks is thread local....
Added some debug code: ```python def _custom_saved_tensors_hooks_exit( self: torch.autograd.graph.saved_tensors_hooks, exc_type=None, exc_val=None, exc_tb=None ): print(f"*** _custom_saved_tensors_hooks_exit, stack {_custom_saved_tensors_hooks_tls_ctx.stack}") f = sys._getframe() while f: co = f.f_code print("-", co.co_name, co.co_filename, f.f_lineno) f...
I have a standalone test case: ```python def test_saved_tensors_hooks_gc_segfault2(): # https://github.com/rwth-i6/returnn/issues/1581 shape = (101, 103) for i in range(10): v1 = torch.nn.Parameter(torch.randn(shape)) v2 = torch.nn.Parameter(torch.randn(shape)) class _Handler: def __init__(self, exit_in_unpack:...
Slightly different version: ```python def test_saved_tensors_hooks_gc_segfault2(): # https://github.com/rwth-i6/returnn/issues/1581 shape = (101, 103) for i in range(10): print("**** iter", i) v = torch.nn.Parameter(torch.randn(shape)) class _Handler: def __init__(self): self.scope = torch.autograd.graph.saved_tensors_hooks(self._pack_hook, self._unpack_hook)...
I reported that upstream: https://github.com/pytorch/pytorch/issues/130734
I pushed a workaround now. See `_can_exit_saved_tensors_hooks_inside_hooks`. If possible, I would like to extend this logic later. But let's wait for the response in https://github.com/pytorch/pytorch/issues/130734.