gpt-neox issues

Restore ability to get logits from generation?

7

It's often useful when generating samples to be able to get the logits / probabilities of the generated tokens (e.g. for ranking suggestions). It looks like this used to be...

moyix

feature request

help wanted

getting error saying init () got an unexpected keyword argument 'checkpointable_layers' when i started training

1

**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. install all requirements 2. download slim weights 3. start...

whoislimshady

bug

OOM issues running inference with large contexts on 2x3090 system

**Describe the bug** On a 2x3090 system, repeatedly running inference on large contexts (2000+ tokens) will sometimes cause the processes to crash with a CUDA OOM error. With some testing...

fpgaminer

bug

Update documentation

3

First of all, thank you for this repo. It allowed me to start training a GPT model from random initialization. A couple of things that I have noticed: - The...

ayl

documentation

Cuda OOM with 20B model

I am trying to finetune 20B model with APPS dataset with slim weights. The config is identical to the one you provided in the repository with some tweaks (listing them...

gaarutyunov

bug

Checkpoint merge script

3

This PR removes the old merge script and adds a new one. I assume the PR for config file management https://github.com/EleutherAI/gpt-neox/pull/463 to be merged so that config files are in...

sweinbach

13B Model Out of Memory with Single Node 8 A100 GPUs

13

Hi! Thanks for contribution making this repo available :) I tried to train the 13B model with micro batch size 1, model parallelism degree 8, but unable to get it...

benathi

Deepspeed zero optimizer, error converting model checkpoints

2

**Describe the bug** Unable to convert a custom gpt-neox model (with zero stage 3) checkpoints using zero_to_fp32.py script. **To Reproduce** Train a model with zero stage 3, pp=0, mp=1 (haven't...

MatejUlcar

bug

Model and config code of an HF gpt-neox model; a conversion script.

16

The modeling and configuration files are largely based on HF's gpt-j model. (I found gpt-j's architecture more similar to gpt-neox than gpt-neo, especially it uses rotary embedding) Modifications to the...

ZHAOTING

Add generating batch size

7

The origin implementation was generating samples one by one, which results in low gpu utility for small model. This commit adds a generating-batch-size config to enable batch generation. Thank you...

zhuzilin

gpt-neox
gpt-neox copied to clipboard

Metadata

Restore ability to get logits from generation?

getting error saying init () got an unexpected keyword argument 'checkpointable_layers' when i started training

OOM issues running inference with large contexts on 2x3090 system

Update documentation

Cuda OOM with 20B model

Checkpoint merge script

13B Model Out of Memory with Single Node 8 A100 GPUs

Deepspeed zero optimizer, error converting model checkpoints

Model and config code of an HF gpt-neox model; a conversion script.

Add generating batch size

← Metadata

Owner

Metadata

gpt-neox gpt-neox copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-neox
gpt-neox copied to clipboard