Stella Biderman

Results 29 issues of Stella Biderman

It was raised in https://github.com/EleutherAI/gpt-neox/issues/482?notification_referrer_id=NT_kwDOAPKasLMyODMxNjY1ODU3OjE1ODk5MzEy#issuecomment-996767144 that the QuickStart default settings aren’t actually intended to be used to train a model to completion, and that this is confusing to new users....

feature request

**Is your feature request related to a problem? Please describe.** FLAN and T0 are two frameworks for finetuning language models on task-structured data. Both papers show significant improvement in LM...

feature request

**Describe the bug** Running the model gives the following warning: `[2021-11-20 20:08:18,491] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer. ` We should update the way that our code...

bug

**Describe the bug** It appears that imbalances in the distillation weights has a significant impact on performance. When I set them all equal to 1, it runs twice as fast...

bug

@preethamgali wrote a model distilling framework [here](https://github.com/EleutherAI/distilling) which we should aim to integrate into GPT-NeoX

feature request

I train [large language models](https://github.com/EleutherAI/gpt-neox) using DeepSpeed's ZeRO optimizer. Does this library support ZeRO?

feature request
good first issue