habitat-lab
habitat-lab copied to clipboard
Autocast and fp16 mixed training -- faster training with less memory usage
Motivation and Context
Add support for torch.cuda.amp.autocast and fp16 mixed training. Faster training with the former and faster training with less memory usage with the latter (albeit both jumps introduce instability, so they may not always work).
How Has This Been Tested
Added integration tests. The training with these is stable for PointNav with GPS+Compass with these models. The stability for other things is unknown.
Types of changes
- New feature (non-breaking change which adds functionality)
@mathfac @Skylion007 currently our CI is on quite an old version of PyTorch, autocast just finally become usable for RNN models in 1.7.1, do either of you have issue with updating the CI?
@mathfac @Skylion007 currently our CI is on quite an old version of PyTorch, autocast just finally become usable for RNN models in 1.7.1, do either of you have issue with updating the CI?
Nope, go ahead.
Do you have rough estimation of possible speed/memory gains?
It is highly dependent on the exact workload so it's impossible to say anything concrete. Using mixed can reduce memory usage by as much as 50% (since activations will all be in fp16). I've seen both autocast and mixed can both increase speed by as much as 2x. I've also seen autocast and mixed not increase speed at all -- there is overhead involved with both, so if your benefit from using tensor cores is less than or equal to that overhead, it won't matter.
Yeah, the memory saving parts are the more beneficial part of this PR for sure.
@erikwijmans Wanted to see if we could this PR merged as well. :)
I think I am gonna trim this to just the autocast version but keep this open so people can see an example of fp16 mixed if they want to try it. As I have played more with fp16 mixed, it can be really hard to make stable so I am not sure if shipping it makes sense.