Stas Bekman

Results 664 comments of Stas Bekman

@tjruwase, one other thing to be concerned about in the context of the discussion of https://github.com/huggingface/diffusers/pull/3076 is all those `global` objects - these will be a problem when using multiple...

The single-model concept has been used until now and the problem is that `zero.Init` is integrated into `from_pretrained` so that it can automatically partition the model transparently for the user....

Totally! I think `NamedTuple` is generic enough to have a built-in support. I wonder how many other output types haven't been considered. And Tunji, if you remember we had this...

> @stas00, do you have a specific need for this behavior. Perhaps the solution is to provide an API for client to explicit flush the parameter cache. Yes, I'm thinking...

Thank you, Tunji. That works for me. Let's merge it So one problem this caching leads to is that checkpoint save/loading tests might be broken, because the small test data...

I'm still trying to sort it out. But the tests aren't testing the right thing in some situations. When running this test: ``` tests/unit/checkpoint/test_zero_optimizer.py -k test_load_module_only[3] ``` If you add...

Thank you, Jeff And FYI pytorch-2.0 now has a built-in fused version as well! we have just integrated it into transformers: https://github.com/huggingface/transformers/pull/22144

I see Tunji has already helped you here a lot, and I will let him follow up on the DS config questions. Just a quick note that DS will be...

well, I was very hopeful but so far no luck. But perhaps it'd work better for you. Basically just add `--torch_compile` to your Trainer arguments after installing pytorch-2.0 (which should...

ok a small update, apparently it breaks on dynamic shapes. Can you make all your inputs of a fixed length, @lavaaa7 - if you can then I am told it...