habitat-lab Autocast and fp16 mixed training -- faster training with less memory usage

Autocast and fp16 mixed training -- faster training with less memory usage

Open erikwijmans opened this issue 4 years ago • 6 comments

trafficstars

Motivation and Context

Add support for torch.cuda.amp.autocast and fp16 mixed training. Faster training with the former and faster training with less memory usage with the latter (albeit both jumps introduce instability, so they may not always work).

How Has This Been Tested

Added integration tests. The training with these is stable for PointNav with GPS+Compass with these models. The stability for other things is unknown.

Types of changes

New feature (non-breaking change which adds functionality)

Jan 31 '21 02:01 erikwijmans

@mathfac @Skylion007 currently our CI is on quite an old version of PyTorch, autocast just finally become usable for RNN models in 1.7.1, do either of you have issue with updating the CI?

Jan 31 '21 02:01 erikwijmans

@mathfac @Skylion007 currently our CI is on quite an old version of PyTorch, autocast just finally become usable for RNN models in 1.7.1, do either of you have issue with updating the CI?

Nope, go ahead.

Jan 31 '21 02:01 Skylion007

Do you have rough estimation of possible speed/memory gains?

It is highly dependent on the exact workload so it's impossible to say anything concrete. Using mixed can reduce memory usage by as much as 50% (since activations will all be in fp16). I've seen both autocast and mixed can both increase speed by as much as 2x. I've also seen autocast and mixed not increase speed at all -- there is overhead involved with both, so if your benefit from using tensor cores is less than or equal to that overhead, it won't matter.

Feb 06 '21 17:02 erikwijmans

Yeah, the memory saving parts are the more beneficial part of this PR for sure.

Feb 06 '21 18:02 Skylion007

@erikwijmans Wanted to see if we could this PR merged as well. :)

Jun 17 '21 15:06 Skylion007

I think I am gonna trim this to just the autocast version but keep this open so people can see an example of fp16 mixed if they want to try it. As I have played more with fp16 mixed, it can be really hard to make stable so I am not sure if shipping it makes sense.

Jun 19 '21 15:06 erikwijmans

habitat-lab habitat-lab copied to clipboard

Autocast and fp16 mixed training -- faster training with less memory usage

Motivation and Context

How Has This Been Tested

Types of changes

habitat-lab
habitat-lab copied to clipboard