Stas Bekman

Results 664 comments of Stas Bekman

Absolutely. Just make a PR that implements what's needed: 1. with an CLI option to activate it 2. updated dependencies file and then we will create a target repo on...

The correct syntax is: ``` git clone --single-branch --branch megatron-deepspeed https://huggingface.co/bigscience/gpt2-350m-en ``` reference: https://stackoverflow.com/a/1911126/9201239 or: ``` git clone https://huggingface.co/bigscience/gpt2-350m-en cd gpt2-350m-en git checkout megatron-deepspeed ``` the former will download only...

> @stas00 Will the unit test run with the CI? I'm wondering if/whether the test script would have to download the Megatron checkpoints manually on each run. The AWS-based CI...

It appears that this has been fixed in pt-1.10 that has been just released, see: https://github.com/pytorch/pytorch/issues/66067 Which is a bummer since we are on pt-1.8.1 on JZ, but we can...

And you can check if the key is there before accessing it, and while normally this is a prudent way to do coding - here it is likely to hide...

That's what the args are for, see `get_args just above where you inserted it. But for now let's hardcoded it - once it works we will add an arg. Why...

indeed, so just `model[0]` in the case of deepspeed, it'd be an actual list with more than 1 entry otherwise. > I used nn.Module.named_children() instead of named_modules() because I thought...

Hmm, not sure why it appears to be modifying in-place - I can't quite see where. Well, the first thing to do is to clone and see if the error...

so does it look like there is a bug in Meg, and it goes unnoticed until we try to look at the data? I'm asking Jared about it. Meanwhile let's...

I pushed a fix to make it work with pt