gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Results 203 gpt-neox issues
Sort by recently updated
recently updated
newest added

Looking deeper into the gpt-j residual implementation I found a delta in the way layernorm(s) are applied. I don't see the point in applying two separate layer norm modules to...

Some of our code is fairly underdocumented to say the least. Where possible, it would be good to: - Add input / output typehints to all functions - Add docstrings...

feature request

**Describe the bug** When running `tools/preprocess_data.py` to tokenize my dataset, I was confused why the generated `.bin` and `.idx` files were empty. It turns out that `lm_dataformat`, the library which...

bug
good first issue

re-based version of https://github.com/EleutherAI/gpt-neox/pull/466 Tested only on 20B

**Is your feature request related to a problem? Please describe.** Would be good to remove the megatron tensor parallelism code from NeoX, and [OSLO](https://github.com/tunib-ai/oslo) currently has support for this, and...

feature request
oslo

This is not meant to be merged directly. I just wanted to give an example of changes that you need to make to run on AMD GPUs (tested with rocm-4.5.2)....

**Is your feature request related to a problem? Please describe.** Training very large networks takes a lot of time, requires a lot of resources that are unavailable to many small...

feature request
help wanted

Thank you for open source such a great repo for the community! Your work is really helping our team with training large pretrained model :) In our experiment, we find...

feature request