mish-cuda Memory utilization during training.

Memory utilization during training.

Open dtmoodie opened this issue 3 years ago • 0 comments

As far as I can tell from the source code, this activation doesn't need to cache values to calculate gradients since it recalculates the forward pass during the backwards pass: https://github.com/thomasbrandon/mish-cuda/blob/master/csrc/mish.h#L26 Is this an accurate statement? I'm sorry if this is dumb, I haven't written any c++ pytorch code so I'm not sure how their API works for caching activations.

Feb 16 '22 03:02 dtmoodie

mish-cuda mish-cuda copied to clipboard

Memory utilization during training.

mish-cuda
mish-cuda copied to clipboard