IndexedConv icon indicating copy to clipboard operation
IndexedConv copied to clipboard

Memory usage

Open mikael10j opened this issue 6 years ago • 6 comments

originally opened by Mark Schoene https://gitlab.lapp.in2p3.fr/GammaLearn/GammaLearn/issues/31

mikael10j avatar Dec 12 '18 14:12 mikael10j

Investigating on why indexedconv uses a lot more memory than built-in conv and is quite slower I found:

  • https://discuss.pytorch.org/t/matmul-broadcasting-makes-copies/19494
  • https://discuss.pytorch.org/t/memory-inefficient-in-batch-matrix-multiplication-with-autograd/28164

I'm doing some tests to confirm that it could be the problem.

mikael10j avatar Jan 11 '19 15:01 mikael10j

The memory consumption observed with indexedconv is due to the matmul function which operates broadcasting on tensors before applying matrix multiplication. Indeed the weight matrix needs to be expanded to match the batch size, expanding also the autograd graph. The solution is to compute the matrix multiplication in a for loop over the batch size (as done in cuda/c++ implementation of convolution) but then the time is an issue (python for loop is slow).

mikael10j avatar Jan 14 '19 14:01 mikael10j

hi @mikael10j. Does it solve entirely the memory usage then? (you might still have some small overhead with Python compared to the cuda version)

I am confident we can optimise the batch loop.

vuillaut avatar Jan 14 '19 14:01 vuillaut

hi @mikael10j. Does it solve entirely the memory usage then? (you might still have some small overhead with Python compared to the cuda version)

I am confident we can optimise the batch loop.

Yes it does.

mikael10j avatar Jan 14 '19 14:01 mikael10j

Great news. Could you share the code in a PR please?

vuillaut avatar Jan 14 '19 14:01 vuillaut

See pr #20 .

mikael10j avatar Jan 16 '19 09:01 mikael10j