gpt-neox
gpt-neox copied to clipboard
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
It's often useful when generating samples to be able to get the logits / probabilities of the generated tokens (e.g. for ranking suggestions). It looks like this used to be...
**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. install all requirements 2. download slim weights 3. start...
**Describe the bug** On a 2x3090 system, repeatedly running inference on large contexts (2000+ tokens) will sometimes cause the processes to crash with a CUDA OOM error. With some testing...
First of all, thank you for this repo. It allowed me to start training a GPT model from random initialization. A couple of things that I have noticed: - The...
I am trying to finetune 20B model with APPS dataset with slim weights. The config is identical to the one you provided in the repository with some tweaks (listing them...
This PR removes the old merge script and adds a new one. I assume the PR for config file management https://github.com/EleutherAI/gpt-neox/pull/463 to be merged so that config files are in...
Hi! Thanks for contribution making this repo available :) I tried to train the 13B model with micro batch size 1, model parallelism degree 8, but unable to get it...
**Describe the bug** Unable to convert a custom gpt-neox model (with zero stage 3) checkpoints using zero_to_fp32.py script. **To Reproduce** Train a model with zero stage 3, pp=0, mp=1 (haven't...
The modeling and configuration files are largely based on HF's gpt-j model. (I found gpt-j's architecture more similar to gpt-neox than gpt-neo, especially it uses rotary embedding) Modifications to the...
The origin implementation was generating samples one by one, which results in low gpu utility for small model. This commit adds a generating-batch-size config to enable batch generation. Thank you...