gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Results 203 gpt-neox issues
Sort by recently updated
recently updated
newest added

It's often useful when generating samples to be able to get the logits / probabilities of the generated tokens (e.g. for ranking suggestions). It looks like this used to be...

feature request
help wanted

**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. install all requirements 2. download slim weights 3. start...

bug

**Describe the bug** On a 2x3090 system, repeatedly running inference on large contexts (2000+ tokens) will sometimes cause the processes to crash with a CUDA OOM error. With some testing...

bug

First of all, thank you for this repo. It allowed me to start training a GPT model from random initialization. A couple of things that I have noticed: - The...

documentation

I am trying to finetune 20B model with APPS dataset with slim weights. The config is identical to the one you provided in the repository with some tweaks (listing them...

bug

This PR removes the old merge script and adds a new one. I assume the PR for config file management https://github.com/EleutherAI/gpt-neox/pull/463 to be merged so that config files are in...

Hi! Thanks for contribution making this repo available :) I tried to train the 13B model with micro batch size 1, model parallelism degree 8, but unable to get it...

**Describe the bug** Unable to convert a custom gpt-neox model (with zero stage 3) checkpoints using zero_to_fp32.py script. **To Reproduce** Train a model with zero stage 3, pp=0, mp=1 (haven't...

bug

The modeling and configuration files are largely based on HF's gpt-j model. (I found gpt-j's architecture more similar to gpt-neox than gpt-neo, especially it uses rotary embedding) Modifications to the...

The origin implementation was generating samples one by one, which results in low gpu utility for small model. This commit adds a generating-batch-size config to enable batch generation. Thank you...