multiple GPUs? How to train bigger models ...
I foundyourAitextgen colab( https://colab.research.google.com/drive/15qBZx5y9rdaQSyWpsreMDnTiZ5IlN0zD?usp=sharing )for finetuning GPT2.
Have you ever tried to use Pytorch Lightning to finetune a larger model on multiple GPUs?
Could you help me with that? What would I need to change in your Colab? :)
Are there other ways to finetune bigger models like the 1,5B or the 0,7 B versions?
Kind regards, Christoph
From what I can tell this project is missing some key scalability features that require big model training to be done on a single GPU with large vram or CPU training
I'm currently tuning the 774M model using 25,000,000 lines of text as the input (2GB), to do this I needed 624GB of RAM just to tokenize the dataset (or swap and a lot of free time) in theory the token merger would have helped with this issue but that's currently broken. Once the data was tokenized the CPU training only uses 12 threads of the 24 I have available, the training requires a peak of 65GBs of memory to train using default settings. This method of training is capable of doing 5000 iterations in about 16 hours
my memory usage is much higher than would typically be expected due to the amount of text input I'm feeding in. With anything in the 30MB range of input you might be able to tokenize your data and train the 774M model on a GPU with 24GB of vram. I say might because my CPU testing right now is showing nearly 40GB of memory usage while training the parameters I described.
This should be automatic w/ Pytorch-lightning but I have not explicitly tested it, yet.