gpt-neox
gpt-neox copied to clipboard
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
When using a model trained with AliBi positional embedding for inference, the cached matrix gets invalidated and recomputed after every generated token, which is very expensive. This PR offers an...
I'm currently trying to run configuration on 2 GPUs. It was running okay until I ran into the following error: File "/mnt/home/limsue/anaconda3/envs/GPTNEO/lib/python3.10/site-packages/datasets/table.py", line 21, in wrapper out = wraps(arrow_table_method)(method) File...
**Describe the bug** Setting `"text-gen-type": "interactive"` results in an `IndexError: : shape mismatch: indexing tensors could not be broadcast together with shapes [4], [3]`. Other generation types work. **To Reproduce**...
**Describe the bug** RuntimeError: Error(s) in loading state_dict for EmbeddingPipe: size mismatch for word_embeddings.weight: copying a param with shape torch.Size([25216, 6144]) from checkpoint, the shape in current model is torch.Size([50304,...
**Describe the bug** It seems like there's an issue with the dependencies. Error Output ```tex $ pip install -r requirements/requirements.txt Defaulting to user installation because normal site-packages is not writeable...
**Describe the bug** Unable to run the evaluate.py with a gpt-neox model trained with pp=0, mp=1. **To Reproduce** Train a 13B model with zero stage 2, pp=0, mp=1. Save checkpoint....
**Describe the bug** Even though temperature is set to 0.0 in the configs, GPT-NeoX' generate.py (interactive mode) produces different results different outputs for the same input submitted multiple times in...
Closes #588
Hi, I am attempting to finetune the 20B model, using the provided `configs/20B.yaml` edited with the settings as followed: - Dropping `pipe-parallel-size` to 1 - Adding `finetune=true` - Dropping `train_micro_batch_size_per_gpu`...