Larger models like GPT-J and GPT-NeoX-20B
Has this library been tested with larger models such as GPT-J-6B and GPT-NeoX-20B? Are there plans to support larger models like these? Thanks.
To use these large models, you will need to parallelize them on multiple GPUs because they won't fit on a single GPU. I think they mentioned in the readme that they implemented parallel data support. Also, it can be added easily if you want to use different data-parallel methods (e.g., Deepspeed) by editing the trainer in the code since the library uses HF trainer.
Thanks for the response @Alymostafa. I am still wondering if any experiments have been done to show that the library can get good results on larger models.
In their paper, I think they use the T5(220M) and GPT-2(117M). But I think the same methodology is applied by different papers (eg, InstructGPT). So you can give it a shot.