ATen Efficient zero-filled tensors

Efficient zero-filled tensors

Open ezyang opened this issue 6 years ago • 7 comments

We need a way to efficiently represent a tensor with a concrete type and shape, but which is filled with all zeros, without having to actually materialize the tensor in question. The primary motivation for this is in backwards computation, when not all outputs have gradients. In this case, at the point we invoke the backwards, some of the grad_outputs do not exist; however, the backwards itself still needs these outputs as inputs. Today, you have two options:

Pass in undefined. The problem is that undefined tensors don't support any operations and don't carry type or shape information, so now the backwards code has to know how to deal with a combinatorial set of undefined/defined input permutations, and if it really does need to know the type/shape info of the backwards, that information needs to be stored via a side channel.
Pass in a zero-filled tensor. This is morally the correct thing to do mathematically speaking. The problem is now you're materializing possibly giant zero-filled tensors which you are not actually going to use in any useful way.

Today, we take a hybrid approach in PyTorch:

For Python autograd functions, we materialize and pass in zero-filled tensors (2)
For C++ autograd functions, we pass in undefined tensors. The C++ code is expected to handle if the inputs are undefined. This results in awful code, see for example: https://github.com/pytorch/pytorch/blob/master/tools/autograd/templates/Functions.cpp#L264-L272

It is not entirely clear what a good, simple implementation strategy for zero-filled tensors in ATen is, as it combinatorially increases the number of combinations of inputs implementations need to support.

CC @zdevito @gchanan @colesbury

Nov 08 '17 02:11 ezyang

Is there something wrong with Scalar(0).toTensor().expand(the_size)

Nov 08 '17 03:11 zdevito

Hmm, that does seem like it should work :)

Nov 08 '17 05:11 ezyang

The possible caveat with that is that some operations require contiguous tensors to work with (BLAS, cudnn) and so extra .clone() might be needed in some places. Also, for some functions, it is interesting to know that some gradients are full of zeros so that we can avoid doing some operations during the backward pass.

Nov 08 '17 10:11 albanD

also 0-strided tensors don't work in inplace operations.

Nov 08 '17 16:11 gchanan

To elaborate on @gchanan's comment, if you have x += y (as frequently occurs during gradient calculation), the semantics are different if x is 0-dim or n-dim zero-filled tensor. 0-dim inplace addition will be rejected unless y is also 0-dim, but an n-dim zero-filled tensor will work in all cases.

Nov 08 '17 20:11 ezyang

is it actually rejected? I seem to remember it just gives you the wrong answer.

Nov 08 '17 20:11 gchanan

You're right, it's not rejected.

>>> x = torch.Tensor([0]).expand(2, 2)
>>> x += torch.Tensor([[1,2],[3,4]])
>>> x

 10  10
 10  10
[torch.FloatTensor of size 2x2]

Nov 08 '17 20:11 ezyang

ATen ATen copied to clipboard

Efficient zero-filled tensors

ATen
ATen copied to clipboard