gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Results 203 gpt-neox issues
Sort by recently updated
recently updated
newest added

1. [AOTAutograd](https://github.com/pytorch/functorch) is a novel engine provided by functorch that can fuse all parts of a neural network. I added it to [OSLO](https://github.com/tunib-ai/oslo/tree/master/oslo/pytorch/kernel_fusion/mem_efficient) recently, and this makes training very faster....

feature request

**TL,DR:** The non-ZeRO ("stage 0") optimizer in DeepSpeed makes fragile assumptions about the optimizer state in the checkpoint, even when ``finetune: true`` configuration parameter is set. A mitigating factor is...

bug

# Overview In order to test effectively any changes to the codebase using the full cuda / mpi / apex stack of the repository, it would be nice to dedicate...

feature request

ONNX is a common export format to convert models for deployment. **Describe the solution you'd like** a tool comand line that would export the model as a usable ONNX file...

feature request

from [DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times ](https://www.deepspeed.ai/news/2021/12/09/deepspeed-moe-nlg.html). It should be a fairly simple addition as [the codebase they open source](https://github.com/microsoft/Megatron-DeepSpeed/tree/moe-training) is largely...

feature request

**Describe the bug** While training on The Pile, I was getting errors from sparse attention, claiming that the sequence length wasn't divisible by the block size, despite using a sequence...

bug

I am trying to import the weights from one of the models I pre-trained using gpt2-neox in to transformer library for some downstream tests. I used AutoModel.from_pretrained(path to checkpoint,model_config) However...

It was raised in https://github.com/EleutherAI/gpt-neox/issues/482?notification_referrer_id=NT_kwDOAPKasLMyODMxNjY1ODU3OjE1ODk5MzEy#issuecomment-996767144 that the QuickStart default settings aren’t actually intended to be used to train a model to completion, and that this is confusing to new users....

feature request

**Is your feature request related to a problem? Please describe.** To load gpt-neox models using HF AutoModel.from_pretrained functionality. This will broaden the usage of gpt-neox models within the HF eco-system....

feature request

**Is your feature request related to a problem? Please describe.** FLAN and T0 are two frameworks for finetuning language models on task-structured data. Both papers show significant improvement in LM...

feature request