wl-coref icon indicating copy to clipboard operation
wl-coref copied to clipboard

Reduce training memory requirement

Open LifeIsStrange opened this issue 2 years ago • 2 comments

CUDA-enabled machine (48 GB to train, 4 GB to evaluate)

@vdobrovolskii friendly ping Are 48GB really needed to train? Can't we train longer (how long) with less ? couldn't your project leverage FP16, FP8 and other optimizations ? You can get them out of the box if you use roberta from the Transformers library https://github.com/huggingface/transformers Also there is accelerate https://huggingface.co/docs/accelerate/index

I have a 3070 with 8GB of GDDR6 :/

LifeIsStrange avatar Dec 01 '22 10:12 LifeIsStrange

It should be totally possible to reduce the training requirements, and I've been thinking a long time about rewriting the project (because it grew out of another project and there's a lot of legacy in it) using Pytorch Lightning to allow for easy access to the optimizations, multi-gpu training, etc.

I just haven't had time to do that yet :(

vdobrovolskii avatar Dec 01 '22 10:12 vdobrovolskii

Happy to hear that :) No worries you don't owe us anything but that would be great if you find the time/energy/will, plz ping me if that happen someday !

LifeIsStrange avatar Dec 01 '22 11:12 LifeIsStrange