Stella Biderman
Stella Biderman
It was raised in https://github.com/EleutherAI/gpt-neox/issues/482?notification_referrer_id=NT_kwDOAPKasLMyODMxNjY1ODU3OjE1ODk5MzEy#issuecomment-996767144 that the QuickStart default settings aren’t actually intended to be used to train a model to completion, and that this is confusing to new users....
**Is your feature request related to a problem? Please describe.** FLAN and T0 are two frameworks for finetuning language models on task-structured data. Both papers show significant improvement in LM...
**Describe the bug** Running the model gives the following warning: `[2021-11-20 20:08:18,491] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer. ` We should update the way that our code...
**Describe the bug** It appears that imbalances in the distillation weights has a significant impact on performance. When I set them all equal to 1, it runs twice as fast...
@preethamgali wrote a model distilling framework [here](https://github.com/EleutherAI/distilling) which we should aim to integrate into GPT-NeoX
I train [large language models](https://github.com/EleutherAI/gpt-neox) using DeepSpeed's ZeRO optimizer. Does this library support ZeRO?