IndexedConv
IndexedConv copied to clipboard
Memory usage
originally opened by Mark Schoene https://gitlab.lapp.in2p3.fr/GammaLearn/GammaLearn/issues/31
Investigating on why indexedconv uses a lot more memory than built-in conv and is quite slower I found:
- https://discuss.pytorch.org/t/matmul-broadcasting-makes-copies/19494
- https://discuss.pytorch.org/t/memory-inefficient-in-batch-matrix-multiplication-with-autograd/28164
I'm doing some tests to confirm that it could be the problem.
The memory consumption observed with indexedconv is due to the matmul function which operates broadcasting on tensors before applying matrix multiplication. Indeed the weight matrix needs to be expanded to match the batch size, expanding also the autograd graph. The solution is to compute the matrix multiplication in a for loop over the batch size (as done in cuda/c++ implementation of convolution) but then the time is an issue (python for loop is slow).
hi @mikael10j. Does it solve entirely the memory usage then? (you might still have some small overhead with Python compared to the cuda version)
I am confident we can optimise the batch loop.
hi @mikael10j. Does it solve entirely the memory usage then? (you might still have some small overhead with Python compared to the cuda version)
I am confident we can optimise the batch loop.
Yes it does.
Great news. Could you share the code in a PR please?
See pr #20 .