gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Results 203 gpt-neox issues
Sort by recently updated
recently updated
newest added

**Describe the bug** When starting with `small.yml`, then changing ZeRO to 2 and `cpu_offload` to `true`, I get the following error: ``` RuntimeError: expected input to be on cuda ```...

bug

**Describe the bug** Running the model gives the following warning: `[2021-11-20 20:08:18,491] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer. ` We should update the way that our code...

bug

The requirements files specified in ./requirements have historically been strict as to prevent CI Docker images changing without our prior knowledge. However, this places a burden on users who would...

**Describe the bug** preprocess_data script expects to have "text" column in the json input regardless of the json-keys passed in the arguments. This is due to lmd.Reader(fname).stream_data() expects to have...

bug

Getting this error on import of deepspeed. I am currently using torch 1.8.0 and installed the requirements.txt as directed. I am also not able to install the apex link provided.

bug

**Describe the bug** It appears that imbalances in the distillation weights has a significant impact on performance. When I set them all equal to 1, it runs twice as fast...

bug

**Is your feature request related to a problem? Please describe.** I´m frustrated because I can´t use my Geforce MX 250 to train a 13B GPT-NeoX. **Describe the solution you'd like**...

feature request

@preethamgali wrote a model distilling framework [here](https://github.com/EleutherAI/distilling) which we should aim to integrate into GPT-NeoX

feature request

**Describe the bug** Loss for RPE position embedding not going down [2021-05-04 15:45:14,710] [INFO] [unfused_optimizer.py:246:_update_scale] Grad overflow on iteration: 50 [2021-05-04 15:45:14,710] [INFO] [unfused_optimizer.py:246:_update_scale] Grad overflow on iteration: 50 [2021-05-04...

bug