gpt-neox issues

Adding MLM and Multitask Finetuning Adaptation

2

Update AliBi matrix caching for dynamic sequence lengths

2

When using a model trained with AliBi positional embedding for inference, the cached matrix gets invalidated and recomputed after every generated token, which is very expensive. This PR offers an...

VHellendoorn

Error while running ./deepy.py generate.py ./configs/20B.yml

2

I'm currently trying to run configuration on 2 GPUs. It was running okay until I ran into the following error: File "/mnt/home/limsue/anaconda3/envs/GPTNEO/lib/python3.10/site-packages/datasets/table.py", line 21, in wrapper out = wraps(arrow_table_method)(method) File...

Slim0120

Error on interactive generation

10

**Describe the bug** Setting `"text-gen-type": "interactive"` results in an `IndexError: : shape mismatch: indexing tensors could not be broadcast together with shapes [4], [3]`. Other generation types work. **To Reproduce**...

tonigi

bug

good first issue

RuntimeError: Error(s) in loading state_dict for EmbeddingPipe: size mismatch for word_embeddings.weight

9

**Describe the bug** RuntimeError: Error(s) in loading state_dict for EmbeddingPipe: size mismatch for word_embeddings.weight: copying a param with shape torch.Size([25216, 6144]) from checkpoint, the shape in current model is torch.Size([50304,...

mcao516

bug

good first issue

help wanted

Unable to install dependencies: No matching distribution found for triton==0.4.2

4

**Describe the bug** It seems like there's an issue with the dependencies. Error Output ```tex $ pip install -r requirements/requirements.txt Defaulting to user installation because normal site-packages is not writeable...

tsndr

bug

ModuleAttributeError: 'DeepSpeedEngine' object has no attribute 'is_pipe_parallel'

**Describe the bug** Unable to run the evaluate.py with a gpt-neox model trained with pp=0, mp=1. **To Reproduce** Train a 13B model with zero stage 2, pp=0, mp=1. Save checkpoint....

sameeravithana

bug

Text generation yields different outputs despite temperature = 0.0

**Describe the bug** Even though temperature is set to 0.0 in the configs, GPT-NeoX' generate.py (interactive mode) produces different results different outputs for the same input submitted multiple times in...

ScTof

bug

good first issue

Add Support for Returning Logits for Generated Tokens

3

Closes #588

Kyle1668

CUDA Out of Memory for 20B Model on 2 A100 40GB GPUs

5

Hi, I am attempting to finetune the 20B model, using the provided `configs/20B.yaml` edited with the settings as followed: - Dropping `pipe-parallel-size` to 1 - Adding `finetune=true` - Dropping `train_micro_batch_size_per_gpu`...

seeEssex

gpt-neox
gpt-neox copied to clipboard

Metadata

Adding MLM and Multitask Finetuning Adaptation

Update AliBi matrix caching for dynamic sequence lengths

Error while running ./deepy.py generate.py ./configs/20B.yml

Error on interactive generation

RuntimeError: Error(s) in loading state_dict for EmbeddingPipe: size mismatch for word_embeddings.weight

Unable to install dependencies: No matching distribution found for triton==0.4.2

ModuleAttributeError: 'DeepSpeedEngine' object has no attribute 'is_pipe_parallel'

Text generation yields different outputs despite temperature = 0.0

Add Support for Returning Logits for Generated Tokens

CUDA Out of Memory for 20B Model on 2 A100 40GB GPUs

← Metadata

Owner

Metadata

gpt-neox gpt-neox copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-neox
gpt-neox copied to clipboard