Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

Make deepspeed.zero.Init() idempotent

@tjruwase, one other thing to be concerned about in the context of the discussion of https://github.com/huggingface/diffusers/pull/3076 is all those `global` objects - these will be a problem when using multiple...

Make deepspeed.zero.Init() idempotent

The single-model concept has been used until now and the problem is that `zero.Init` is integrated into `from_pretrained` so that it can automatically partition the model transparently for the user....

[BUG] Outputs of type NamedTuple cause crash in `_apply_to_tensors_only` (stage 3 + shard parameters)

Totally! I think `NamedTuple` is generic enough to have a built-in support. I wonder how many other output types haven't been considered. And Tunji, if you remember we had this...

[BUG] zero3 memory leak on return from training loop

> @stas00, do you have a specific need for this behavior. Perhaps the solution is to provide an API for client to explicit flush the parameter cache. Yes, I'm thinking...

[BUG] zero3 memory leak on return from training loop

Thank you, Tunji. That works for me. Let's merge it So one problem this caching leads to is that checkpoint save/loading tests might be broken, because the small test data...

[BUG] zero3 memory leak on return from training loop

I'm still trying to sort it out. But the tests aren't testing the right thing in some situations. When running this test: ``` tests/unit/checkpoint/test_zero_optimizer.py -k test_load_module_only[3] ``` If you add...

[REQUEST] sync FusedAdam with the upstream

Thank you, Jeff And FYI pytorch-2.0 now has a built-in fused version as well! we have just integrated it into transformers: https://github.com/huggingface/transformers/pull/22144

13B model training OOM with 8x48G machine and limited CPU RAM

I see Tunji has already helped you here a lot, and I will let him follow up on the DS config questions. Just a quick note that DS will be...

13B model training OOM with 8x48G machine and limited CPU RAM

well, I was very hopeful but so far no luck. But perhaps it'd work better for you. Basically just add `--torch_compile` to your Trainer arguments after installing pytorch-2.0 (which should...

13B model training OOM with 8x48G machine and limited CPU RAM

ok a small update, apparently it breaks on dynamic shapes. Can you make all your inputs of a fixed length, @lavaaa7 - if you can then I am told it...