sdtblck
sdtblck
In https://github.com/EleutherAI/gpt-neox we were previously maintaining two separate models - one if the user wanted to use pipeline parallel, and one if they didn't. The more straightforward solution was to...
To simulate naive local attention
I'm attempting to encode multiple concurrent space tokens as special tokens (to increase compressibility for documents with many spaces, e.g code) - but am getting some issues. my tokenizer file...
as well as a few other minor fixes. Untested right now so pls don't merge.
Some of our code is fairly underdocumented to say the least. Where possible, it would be good to: - Add input / output typehints to all functions - Add docstrings...
**Is your feature request related to a problem? Please describe.** Would be good to remove the megatron tensor parallelism code from NeoX, and [OSLO](https://github.com/tunib-ai/oslo) currently has support for this, and...
from [DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times ](https://www.deepspeed.ai/news/2021/12/09/deepspeed-moe-nlg.html). It should be a fairly simple addition as [the codebase they open source](https://github.com/microsoft/Megatron-DeepSpeed/tree/moe-training) is largely...