gpt-neox issues

Introduce improvements from OSLO

6

1. [AOTAutograd](https://github.com/pytorch/functorch) is a novel engine provided by functorch that can fuse all parts of a neural network. I added it to [OSLO](https://github.com/tunib-ai/oslo/tree/master/oslo/pytorch/kernel_fusion/mem_efficient) recently, and this makes training very faster....

hyunwoongko

feature request

Fine-tuning GPT-NeoX doesn't work (for many scenarios) with the 16-bit stage-0 optimizer

1

**TL,DR:** The non-ZeRO ("stage 0") optimizer in DeepSpeed makes fragile assumptions about the optimizer state in the checkpoint, even when ``finetune: true`` configuration parameter is set. A mitigating factor is...

igor0

bug

Hosted Github Runners for CI

4

# Overview In order to test effectively any changes to the codebase using the full cuda / mpi / apex stack of the repository, it would be nice to dedicate...

Mistobaan

feature request

ONNX Export / Inference Engine

ONNX is a common export format to convert models for deployment. **Describe the solution you'd like** a tool comand line that would export the model as a usable ONNX file...

Mistobaan

feature request

Add Mixture of Experts

from [DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times ](https://www.deepspeed.ai/news/2021/12/09/deepspeed-moe-nlg.html). It should be a fairly simple addition as [the codebase they open source](https://github.com/microsoft/Megatron-DeepSpeed/tree/moe-training) is largely...

sdtblck

feature request

Negative document indices caused by 64 bit integer stored in a 32 bit integer array.

3

**Describe the bug** While training on The Pile, I was getting errors from sparse attention, claiming that the sequence length wasn't divisible by the block size, despite using a sequence...

pwstegman

bug

importing checkpoints to transformer library

1

I am trying to import the weights from one of the models I pre-trained using gpt2-neox in to transformer library for some downstream tests. I used AutoModel.from_pretrained(path to checkpoint,model_config) However...

eghbalhosseini

Update QuickStart to Something Usable

It was raised in https://github.com/EleutherAI/gpt-neox/issues/482?notification_referrer_id=NT_kwDOAPKasLMyODMxNjY1ODU3OjE1ODk5MzEy#issuecomment-996767144 that the QuickStart default settings aren’t actually intended to be used to train a model to completion, and that this is confusing to new users....

StellaAthena

feature request

HF Equivalent Pretrained Models

**Is your feature request related to a problem? Please describe.** To load gpt-neox models using HF AutoModel.from_pretrained functionality. This will broaden the usage of gpt-neox models within the HF eco-system....

sameeravithana

feature request

Add FLAN and T0 finetuning data

2

**Is your feature request related to a problem? Please describe.** FLAN and T0 are two frameworks for finetuning language models on task-structured data. Both papers show significant improvement in LM...

StellaAthena

feature request

gpt-neox
gpt-neox copied to clipboard

Metadata

Introduce improvements from OSLO

Fine-tuning GPT-NeoX doesn't work (for many scenarios) with the 16-bit stage-0 optimizer

Hosted Github Runners for CI

ONNX Export / Inference Engine

Add Mixture of Experts

Negative document indices caused by 64 bit integer stored in a 32 bit integer array.

importing checkpoints to transformer library

Update QuickStart to Something Usable

HF Equivalent Pretrained Models

Add FLAN and T0 finetuning data

← Metadata

Owner

Metadata

gpt-neox gpt-neox copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-neox
gpt-neox copied to clipboard