gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Results 203 gpt-neox issues
Sort by recently updated
recently updated
newest added

When using a model trained with AliBi positional embedding for inference, the cached matrix gets invalidated and recomputed after every generated token, which is very expensive. This PR offers an...

I'm currently trying to run configuration on 2 GPUs. It was running okay until I ran into the following error: File "/mnt/home/limsue/anaconda3/envs/GPTNEO/lib/python3.10/site-packages/datasets/table.py", line 21, in wrapper out = wraps(arrow_table_method)(method) File...

**Describe the bug** Setting `"text-gen-type": "interactive"` results in an `IndexError: : shape mismatch: indexing tensors could not be broadcast together with shapes [4], [3]`. Other generation types work. **To Reproduce**...

bug
good first issue

**Describe the bug** RuntimeError: Error(s) in loading state_dict for EmbeddingPipe: size mismatch for word_embeddings.weight: copying a param with shape torch.Size([25216, 6144]) from checkpoint, the shape in current model is torch.Size([50304,...

bug
good first issue
help wanted

**Describe the bug** It seems like there's an issue with the dependencies. Error Output ```tex $ pip install -r requirements/requirements.txt Defaulting to user installation because normal site-packages is not writeable...

bug

**Describe the bug** Unable to run the evaluate.py with a gpt-neox model trained with pp=0, mp=1. **To Reproduce** Train a 13B model with zero stage 2, pp=0, mp=1. Save checkpoint....

bug

**Describe the bug** Even though temperature is set to 0.0 in the configs, GPT-NeoX' generate.py (interactive mode) produces different results different outputs for the same input submitted multiple times in...

bug
good first issue

Hi, I am attempting to finetune the 20B model, using the provided `configs/20B.yaml` edited with the settings as followed: - Dropping `pipe-parallel-size` to 1 - Adding `finetune=true` - Dropping `train_micro_batch_size_per_gpu`...