sdtblck issues

Results 15 issues of


                                            sdtblck

trafficstars

Add function `to_sequential` to PipelineModule

In https://github.com/EleutherAI/gpt-neox we were previously maintaining two separate models - one if the user wanted to use pipeline parallel, and one if they didn't. The more straightforward solution was to...

Allow FixedSparseAttention with num_global_blocks = 0

To simulate naive local attention

Issue with space tokens + BPE tokenizer

I'm attempting to encode multiple concurrent space tokens as special tokens (to increase compressibility for documents with many spaces, e.g code) - but am getting some issues. my tokenizer file...

Add args for sampling & clean up configs

as well as a few other minor fixes. Untested right now so pls don't merge.

run_experiment.py doesn't terminate all threads on keyboard interrupt

yeah

bug

Increase Documentation Coverage

Some of our code is fairly underdocumented to say the least. Where possible, it would be good to: - Add input / output typehints to all functions - Add docstrings...

feature request

Migrate tensor parallelism code to use OSLO

**Is your feature request related to a problem? Please describe.** Would be good to remove the megatron tensor parallelism code from NeoX, and [OSLO](https://github.com/tunib-ai/oslo) currently has support for this, and...

feature request

oslo

Add Mixture of Experts

from [DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times ](https://www.deepspeed.ai/news/2021/12/09/deepspeed-moe-nlg.html). It should be a fairly simple addition as [the codebase they open source](https://github.com/microsoft/Megatron-DeepSpeed/tree/moe-training) is largely...

feature request

Add shampoo optimizer

feature request

Add Progressive Growing of Batch Size

feature request