Stas Bekman

Results 664 comments of Stas Bekman

excellent. Glad to hear it worked for your case. So some saved memory there. So do you have now some breathing space to remove some of the settings Tunji suggested...

if this one is a go please force a CI restart - HF side has been fixed.

yeah, looks like something broke in transformers again - looking

I'm able to reproduce once I install `safetensors` - it works without it - investigating. ok it has to do with the hub creating safetensor weights - not code related....

@jeffra, @tjruwase - I validated the transformers tests failure is unrelated to this feature. It should be safe to merge. @mayank31398 has been waiting for eons for this to be...

and the transformers job should be fine now if someone can trigger the rebuild.

enable gradient checkpointing to liberate a ton of gpu memory, see: https://github.com/microsoft/DeepSpeed/issues/2797#issuecomment-1423466674 in some cases this allows you to double or quadruple the batch size if you were already able...

Your quest is different from this Issue and should be dealt separately. Could you please open a new Issue where you specify all the details of your setup - e.g....

Any plans to work on that, Tunji? We could have really used that feature in the current 80b m4 training, as we would like to add new parameters that were...

> does it mean deepspeed already supports automatically changing world size for Zero-1,2,3 if I use bf16 No, it's not. It's currently a confusing situation as `BF16Optimizer` was written specifically...