returnn icon indicating copy to clipboard operation
returnn copied to clipboard

Tensor deepcopy does not copy raw_tensor

Open albertz opened this issue 1 year ago • 1 comments

deepcopy on Tensor will not copy the raw_tensor:

def __getstate__(self):
    d = {k: getattr(self, k) for k in self.__slots__}
    d["_raw_tensor"] = None  # do not store the TF tensors
    return d

def __setstate__(self, state):
    for k, v in state.items():
        setattr(self, k, v)

Why is that?

First of all, the reason should be documented. Because I don't exactly remember anymore why we do it this way.

But further, this causes some problems in RF modules where we keep some auxiliary constant tensors around, e.g.rf.PiecewiseLinear. If there is a good reason to keep it this way, we need to think about how to solve it in rf.PiecewiseLinear. One solution is to just use rf.Parameter (with auxiliary=True) instead. But a developer might run into this problem again and the error was very confusing. Namely, in the __init__, I was double checking that raw_tensor is set, but then in __call__, raw_tensor was not set anymore, and this caused raw_backend to be None, so many RF functions will fail with AttributeError: 'NoneType' object has no attribute ....

Maybe it makes sense to control the behavior (e.g. via a context scope) to switch between copying raw_tensor and not copying it. We first should understand the reason why we do not copy it.

(Maybe we can just make a dummy draft PR where we remove this line of code, i.e. where we do copy it, and see what tests are failing...)

Test case (e.g. for test_torch_frontend.py):

def test_module_deepcopy_tensor():
    import copy

    class _MyModule(rf.Module):
        def __init__(self):
            super().__init__()
            self.tensor = rf.convert_to_tensor(np.array([1.0, 2.0, 3.0], dtype=np.float32), dims=[Dim(3)])

    mod = _MyModule()
    assert isinstance(mod.tensor.raw_tensor, torch.Tensor)
    mod_ = copy.deepcopy(mod)
    assert isinstance(mod_.tensor.raw_tensor, torch.Tensor)

albertz avatar Jun 17 '24 13:06 albertz

Why is that?

I assume when I wrote this, I thought mostly about rf.Parameter, where I thought that you never want to copy the content but instead the ParamInit (which could be raw values, in which case it would be copied, but usually it's some random init scheme, so only the scheme is copied but not the values).

Also see for example this: https://github.com/pytorch/pytorch/issues/86274

albertz avatar Jul 15 '24 16:07 albertz

The more reasonable behavior is maybe that it is always copied, and if there are cases/reasons when we don't want to copy it, we might need a context scope specifically for that (or we can maybe automatically detect those cases).

albertz avatar Sep 19 '25 13:09 albertz