Stas Bekman

Results 128 issues of Stas Bekman

this is some of the code we have written to debug tied embed synchronization issues, so pushing it here in case it will be needed down the road. Most likely...

Just noticed in the logs that `--skip-train-iteration-range` reports only a single range, when there should be 2: I currently have in the config: ``` --skip-train-iteration-range 13251-14000 16651-19500 ``` But the...

Stella pointed out to how they do consistency calculations/checks with NeoX: https://github.com/EleutherAI/gpt-neox/blob/main/megatron/neox_arguments/arguments.py It'd be good for someone to study what they did over the base Megatron-LM and replicate anything that...

Good First Issue
Good Second Issue

a new test to reproduce the issue with BNB when switching from 1 replica to 2 (i.e. DP degree changes, while keeping PP and TP degrees the same): the original...

We need to have a diagnostic model size dumped during the framework init. We currently get a report per rank and not the total. ``` > number of parameters on...

Good First Issue

we will need to hack https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/pipe/module.py#L378-L384 to support `partition_method` `type:embed:2|transformer:1` - or something like that - now the embed weights will get 2x partitioning weights and will get its own...

Good First Issue

It takes forever to build the Meg cuda kernels as it does it sequentially and doesn't take advantage of multiple cores. It takes some 5 minutes to build. And every...

Good First Issue
Good Difficult Issue

working on debugging on a live checkpoint (with optim states) but with a small custom dataset.

As can be seen from https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/121 we have a divergence between Meg and HF GPT2, while using the same weights under fp16. So the proposed solution to enable users to...

Good First Issue
Good Second Issue