stylegan2-pytorch --fp16 Slower & Does Not Reduce Memory Use

--fp16 Slower & Does Not Reduce Memory Use

Open RayeRTX opened this issue 3 years ago • 9 comments

Hey there @lucidrains,

Came across your incredible work and immediately tried it out on my RTX 2070! Since the training will take some time and require a lot of memory, I was relieved that we can use APEX/Amp to train the model by simply adding the --fp16 option.

Unfortunately for me, the memory usage does not reduce compared to the regular fp32 training and the training speed was slower too.

Came across a similar issue #129 but it was closed before a fix was checked in. Will you still continue to work on fp16? I believe this will help many of your users (and fans!)

Oct 01 '20 14:10 RayeRTX

Hi Raye! I don't know why, but mixed precision no longer brings the memory down. It's the same when I tried to switch to amp. I'm not sure what's wrong, but I'm out of time (moving). Maybe someone else can figure this out!

Oct 01 '20 17:10 lucidrains

@RayeRTX are you getting good results? please share :)

Oct 01 '20 17:10 lucidrains

A bit unrelated, but I can't even get it running - I just keep getting NaN errors and the learning shutdowns.

Oct 01 '20 18:10 tannisroot

@tannisroot yea, I get that feedback a lot. I think I will just remove this feature from the readme and keep it as a silent feature. Perhaps someone can help figure out what's wrong. It has worked for me in the past, so I'm not sure what changed

Oct 01 '20 19:10 lucidrains

Still trying out various settings, lets see what we get!

Oct 02 '20 14:10 RayeRTX

Any chance someone figured out why fp16 is not working?

Oct 21 '20 16:10 TKassis

@lucidrains added Pytorch's Amp to his Lightweight-GAN repo and it works great on my Titan RTX!

Nov 25 '20 02:11 TKassis

@TKassis Thats very useful info! Do you find lightweight gan to work better for you?

Dec 31 '20 17:12 RayeRTX

@TKassis Thats very useful info! Do you find lightweight gan to work better for you?

It trains much much faster, but I haven't compared the two on the same training data.

Jan 01 '21 14:01 TKassis

stylegan2-pytorch stylegan2-pytorch copied to clipboard

--fp16 Slower & Does Not Reduce Memory Use

stylegan2-pytorch
stylegan2-pytorch copied to clipboard