pytorch_sparse icon indicating copy to clipboard operation
pytorch_sparse copied to clipboard

Feature request: dense-sparse-dense bilinear

Open Linux-cpp-lisp opened this issue 4 years ago • 2 comments
trafficstars

General bilinear operations take the form x^T @ A @ y where A is of shape (outdim, xdim, ydim). (See https://pytorch.org/docs/stable/generated/torch.nn.Bilinear.html).

Currently, as far as I can tell, implementing this for a sparse A using pytorch_sparse is only possible as a sparse-dense matmul of A with the outer product of x and y. This outer product is a huge memory cost and, if A is sparse, contains many unnecessary computations.

It seems to me like supporting this computation directly would not require dramatic changes to the existing CPU and CUDA matmul code — seemingly just a second outer loop over x in addition to y. But I'm not really familiar enough with the code to see if that's true.

Ideally, like existing sparse-dense multiplication, it would support a batch dimension in x and y:

x=[batch, dx], A=[dx, dy, dout], y=[batch, dy] --> [batch, dout]

I would be happy to help with implementing this if you think it is doable and something you'd be interested in having in the library!

Linux-cpp-lisp avatar May 14 '21 22:05 Linux-cpp-lisp

That's interesting. I think one current limitation is that SparseTensor needs to be two-dimensional. As such, I'm not really sure if we can directly re-use CPU and CUDA kernels for this. WDYT?

rusty1s avatar May 17 '21 05:05 rusty1s

Hm right — I think all such bilinearities can be expressed as a linear operation between a linear operator A and the outer product of x and y, which can be expressed as a matmul between a flattened version of A and the outer product. As a result, I think the sparse tensor A can remain a 2D matrix, but that you'd need the compute kernel to generate the appropriate elements of the outer product on-the-fly.

A quick look at the CPU kernel (https://github.com/rusty1s/pytorch_sparse/blob/master/csrc/cpu/spmm_cpu.cpp#L84) makes it seem like you're already doing random access on the dense matrix, so extending that to random access on two dense tensors doesn't seem like a big deal. OTOH, I don't know any CUDA and have no idea what that looks like in the CUDA kernels, so it may be much more complicated than for the CPU kernel.

I'm trying a lot of very different approaches to accelerating this strange computation — my A is very sparse but also has a lot of repeated structure, so its not obvious that explicitly storing it in COO format is actually the best way to go — but can't rule anything out either :smile:

Linux-cpp-lisp avatar May 20 '21 06:05 Linux-cpp-lisp