Stas Bekman issues

Results 128 issues of


                                            Stas Bekman

launch debug code

this is some of the code we have written to debug tied embed synchronization issues, so pushing it here in case it will be needed down the road. Most likely...

possibly --skip-train-iteration-range with multiple entries has a bug

Just noticed in the logs that `--skip-train-iteration-range` reports only a single range, when there should be 2: I currently have in the config: ``` --skip-train-iteration-range 13251-14000 16651-19500 ``` But the...

adding consistency calculations/checks at init time

Stella pointed out to how they do consistency calculations/checks with NeoX: https://github.com/EleutherAI/gpt-neox/blob/main/megatron/neox_arguments/arguments.py It'd be good for someone to study what they did over the base Megatron-LM and replicate anything that...

Good First Issue

Good Second Issue

[bnb] resume with more replicas test

a new test to reproduce the issue with BNB when switching from 1 replica to 2 (i.e. DP degree changes, while keeping PP and TP degrees the same): the original...

Need model size dumped at init

We need to have a diagnostic model size dumped during the framework init. We currently get a report per rank and not the total. ``` > number of parameters on...

Good First Issue

[deepspeed pipe] expand the partitioning method to support weights

we will need to hack https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/pipe/module.py#L378-L384 to support `partition_method` `type:embed:2|transformer:1` - or something like that - now the embed weights will get 2x partitioning weights and will get its own...

Good First Issue

Parallelize Meg CUDA Kernel build system

It takes forever to build the Meg cuda kernels as it does it sequentially and doesn't take advantage of multiple cores. It takes some 5 minutes to build. And every...

Good First Issue

Good Difficult Issue

[wip] debug with new data

working on debugging on a live checkpoint (with optim states) but with a small custom dataset.

clone HF's `GPT2` to create `GPTMeg` with a few tiny changes.

As can be seen from https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/121 we have a divergence between Meg and HF GPT2, while using the same weights under fp16. So the proposed solution to enable users to...

Good First Issue

Good Second Issue