IndexedConv Memory usage

originally opened by Mark Schoene https://gitlab.lapp.in2p3.fr/GammaLearn/GammaLearn/issues/31

Dec 12 '18 14:12 mikael10j

Investigating on why indexedconv uses a lot more memory than built-in conv and is quite slower I found:

https://discuss.pytorch.org/t/matmul-broadcasting-makes-copies/19494
https://discuss.pytorch.org/t/memory-inefficient-in-batch-matrix-multiplication-with-autograd/28164

I'm doing some tests to confirm that it could be the problem.

Jan 11 '19 15:01 mikael10j

The memory consumption observed with indexedconv is due to the matmul function which operates broadcasting on tensors before applying matrix multiplication. Indeed the weight matrix needs to be expanded to match the batch size, expanding also the autograd graph. The solution is to compute the matrix multiplication in a for loop over the batch size (as done in cuda/c++ implementation of convolution) but then the time is an issue (python for loop is slow).

Jan 14 '19 14:01 mikael10j

hi @mikael10j. Does it solve entirely the memory usage then? (you might still have some small overhead with Python compared to the cuda version)

I am confident we can optimise the batch loop.

Jan 14 '19 14:01 vuillaut

hi @mikael10j. Does it solve entirely the memory usage then? (you might still have some small overhead with Python compared to the cuda version)

I am confident we can optimise the batch loop.

Yes it does.

Jan 14 '19 14:01 mikael10j

Great news. Could you share the code in a PR please?

Jan 14 '19 14:01 vuillaut

See pr #20 .

Jan 16 '19 09:01 mikael10j

IndexedConv IndexedConv copied to clipboard

Memory usage

IndexedConv
IndexedConv copied to clipboard