gpt-neox issues

ZeRO 2 cpu_offload causes RuntimeError: expected input to be on cuda

6

**Describe the bug** When starting with `small.yml`, then changing ZeRO to 2 and `cpu_offload` to `true`, I get the following error: ``` RuntimeError: expected input to be on cuda ```...

pwstegman

bug

`cpu_offload` is depreciated

1

**Describe the bug** Running the model gives the following warning: `[2021-11-20 20:08:18,491] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer. ` We should update the way that our code...

StellaAthena

bug

Simplify and relax dependencies

9

The requirements files specified in ./requirements have historically been strict as to prevent CI Docker images changing without our prior knowledge. However, this places a burden on users who would...

EricHallahan

Handling multiple fields of the custom input data in the preprocess_data.py

6

**Describe the bug** preprocess_data script expects to have "text" column in the json input regardless of the json-keys passed in the arguments. This is due to lmd.Reader(fname).stream_data() expects to have...

sameeravithana

bug

ModuleNotFoundError: No module named 'deepspeed.ops.op_builder' on import deepspeed

10

Getting this error on import of deepspeed. I am currently using torch 1.8.0 and installed the requirements.txt as directed. I am also not able to install the apex link provided.

shankyemcee

bug

Add shampoo optimizer

sdtblck

feature request

Changing distillation weights changes runtime

**Describe the bug** It appears that imbalances in the distillation weights has a significant impact on performance. When I set them all equal to 1, it runs twice as fast...

StellaAthena

bug

Support ZeRO-Infinity

5

**Is your feature request related to a problem? Please describe.** I´m frustrated because I can´t use my Geforce MX 250 to train a 13B GPT-NeoX. **Describe the solution you'd like**...

bratao

feature request

Integrate distilling

5

@preethamgali wrote a model distilling framework [here](https://github.com/EleutherAI/distilling) which we should aim to integrate into GPT-NeoX

StellaAthena

feature request

loss stuck in overflow for RPE position embedding together with sparse attention

1

**Describe the bug** Loss for RPE position embedding not going down [2021-05-04 15:45:14,710] [INFO] [unfused_optimizer.py:246:_update_scale] Grad overflow on iteration: 50 [2021-05-04 15:45:14,710] [INFO] [unfused_optimizer.py:246:_update_scale] Grad overflow on iteration: 50 [2021-05-04...

sweinbach

bug

gpt-neox
gpt-neox copied to clipboard

Metadata

ZeRO 2 cpu_offload causes RuntimeError: expected input to be on cuda

`cpu_offload` is depreciated

Simplify and relax dependencies

Handling multiple fields of the custom input data in the preprocess_data.py

ModuleNotFoundError: No module named 'deepspeed.ops.op_builder' on import deepspeed

Add shampoo optimizer

Changing distillation weights changes runtime

Support ZeRO-Infinity

Integrate distilling

loss stuck in overflow for RPE position embedding together with sparse attention

← Metadata

Owner

Metadata

gpt-neox gpt-neox copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-neox
gpt-neox copied to clipboard