Larger models like GPT-J and GPT-NeoX-20B

Open loganlebanoff opened this issue 3 years ago • 3 comments

Has this library been tested with larger models such as GPT-J-6B and GPT-NeoX-20B? Are there plans to support larger models like these? Thanks.

Dec 15 '22 14:12 loganlebanoff

To use these large models, you will need to parallelize them on multiple GPUs because they won't fit on a single GPU. I think they mentioned in the readme that they implemented parallel data support. Also, it can be added easily if you want to use different data-parallel methods (e.g., Deepspeed) by editing the trainer in the code since the library uses HF trainer.

Dec 19 '22 03:12 Alymostafa

Thanks for the response @Alymostafa. I am still wondering if any experiments have been done to show that the library can get good results on larger models.

Dec 20 '22 14:12 loganlebanoff

In their paper, I think they use the T5(220M) and GPT-2(117M). But I think the same methodology is applied by different papers (eg, InstructGPT). So you can give it a shot.

Dec 20 '22 14:12 Alymostafa