Stella Biderman issues

Results 29 issues of


Stella Biderman

Update QuickStart to Something Usable

It was raised in https://github.com/EleutherAI/gpt-neox/issues/482?notification_referrer_id=NT_kwDOAPKasLMyODMxNjY1ODU3OjE1ODk5MzEy#issuecomment-996767144 that the QuickStart default settings aren’t actually intended to be used to train a model to completion, and that this is confusing to new users....

feature request

Add FLAN and T0 finetuning data

**Is your feature request related to a problem? Please describe.** FLAN and T0 are two frameworks for finetuning language models on task-structured data. Both papers show significant improvement in LM...

feature request

`cpu_offload` is depreciated

**Describe the bug** Running the model gives the following warning: `[2021-11-20 20:08:18,491] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer. ` We should update the way that our code...

bug

Stella Biderman

Update QuickStart to Something Usable

Add FLAN and T0 finetuning data

`cpu_offload` is depreciated

Changing distillation weights changes runtime

Integrate distilling

Is this compatible with DeepSpeed / ZeRO?

Add GPT-J style residual to T5

MLM without sentinel tokens

Parameter Sharing in NeoX

Encoder-decoder fusion in NeoX