gpt-neox
gpt-neox copied to clipboard
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
1. [AOTAutograd](https://github.com/pytorch/functorch) is a novel engine provided by functorch that can fuse all parts of a neural network. I added it to [OSLO](https://github.com/tunib-ai/oslo/tree/master/oslo/pytorch/kernel_fusion/mem_efficient) recently, and this makes training very faster....
**TL,DR:** The non-ZeRO ("stage 0") optimizer in DeepSpeed makes fragile assumptions about the optimizer state in the checkpoint, even when ``finetune: true`` configuration parameter is set. A mitigating factor is...
# Overview In order to test effectively any changes to the codebase using the full cuda / mpi / apex stack of the repository, it would be nice to dedicate...
ONNX is a common export format to convert models for deployment. **Describe the solution you'd like** a tool comand line that would export the model as a usable ONNX file...
from [DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times ](https://www.deepspeed.ai/news/2021/12/09/deepspeed-moe-nlg.html). It should be a fairly simple addition as [the codebase they open source](https://github.com/microsoft/Megatron-DeepSpeed/tree/moe-training) is largely...
**Describe the bug** While training on The Pile, I was getting errors from sparse attention, claiming that the sequence length wasn't divisible by the block size, despite using a sequence...
I am trying to import the weights from one of the models I pre-trained using gpt2-neox in to transformer library for some downstream tests. I used AutoModel.from_pretrained(path to checkpoint,model_config) However...
It was raised in https://github.com/EleutherAI/gpt-neox/issues/482?notification_referrer_id=NT_kwDOAPKasLMyODMxNjY1ODU3OjE1ODk5MzEy#issuecomment-996767144 that the QuickStart default settings aren’t actually intended to be used to train a model to completion, and that this is confusing to new users....
**Is your feature request related to a problem? Please describe.** To load gpt-neox models using HF AutoModel.from_pretrained functionality. This will broaden the usage of gpt-neox models within the HF eco-system....
**Is your feature request related to a problem? Please describe.** FLAN and T0 are two frameworks for finetuning language models on task-structured data. Both papers show significant improvement in LM...