gpt-neox issues

align gpt-j layernorm to hf

9

Looking deeper into the gpt-j residual implementation I found a delta in the way layernorm(s) are applied. I don't see the point in applying two separate layer norm modules to...

sweinbach

Increase Documentation Coverage

1

Some of our code is fairly underdocumented to say the least. Where possible, it would be good to: - Add input / output typehints to all functions - Add docstrings...

sdtblck

feature request

**Describe the bug** When running `tools/preprocess_data.py` to tokenize my dataset, I was confused why the generated `.bin` and `.idx` files were empty. It turns out that `lm_dataformat`, the library which...

wjeliot

bug

good first issue

add merge script

4

re-based version of https://github.com/EleutherAI/gpt-neox/pull/466 Tested only on 20B

Mistobaan

Migrate tensor parallelism code to use OSLO

7

**Is your feature request related to a problem? Please describe.** Would be good to remove the megatron tensor parallelism code from NeoX, and [OSLO](https://github.com/tunib-ai/oslo) currently has support for this, and...

sdtblck

feature request

oslo

rocm 4.5.2 compliant

This is not meant to be merged directly. I just wanted to give an example of changes that you need to make to run on AMD GPUs (tested with rocm-4.5.2)....

hyoo

Please investigate Retrieval-Enhanced Transformers ( RETRO)

5

**Is your feature request related to a problem? Please describe.** Training very large networks takes a lot of time, requires a lot of resources that are unavailable to many small...

marvin-hansen

feature request

help wanted

Parallel all reduce communication and backprop

6

Thank you for open source such a great repo for the community! Your work is really helping our team with training large pretrained model :) In our experiment, we find...

zhuzilin

feature request

added shampoo optimizer from https://github.com/jettify/pytorch-opti…

7

…mizer/blob/master/torch_optimizer/shampoo.py

rokosbasilisk

gpt-neox
gpt-neox copied to clipboard

Metadata

align gpt-j layernorm to hf

Increase Documentation Coverage

Torchtyping

`lm_dataformat` is outdated

add merge script

Migrate tensor parallelism code to use OSLO

rocm 4.5.2 compliant

Please investigate Retrieval-Enhanced Transformers ( RETRO)

Parallel all reduce communication and backprop

added shampoo optimizer from https://github.com/jettify/pytorch-opti…

← Metadata

Owner

Metadata

gpt-neox gpt-neox copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-neox
gpt-neox copied to clipboard