Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

13B model training OOM with 8x48G machine and limited CPU RAM

excellent. Glad to hear it worked for your case. So some saved memory there. So do you have now some breathing space to remove some of the settings Tunji suggested...

♻️ replace deprecated functions for communication

if this one is a go please force a CI restart - HF side has been fixed.

♻️ replace deprecated functions for communication

yeah, looks like something broke in transformers again - looking

♻️ replace deprecated functions for communication

I'm able to reproduce once I install `safetensors` - it works without it - investigating. ok it has to do with the hub creating safetensor weights - not code related....

♻️ replace deprecated functions for communication

@jeffra, @tjruwase - I validated the transformers tests failure is unrelated to this feature. It should be safe to merge. @mayank31398 has been waiting for eons for this to be...

♻️ replace deprecated functions for communication

and the transformers job should be fine now if someone can trigger the rebuild.

Finetune T5 11B and the process is killed . exits with return code = -9[BUG]

enable gradient checkpointing to liberate a ton of gpu memory, see: https://github.com/microsoft/DeepSpeed/issues/2797#issuecomment-1423466674 in some cases this allows you to double or quadruple the batch size if you were already able...

Finetune T5 11B and the process is killed . exits with return code = -9[BUG]

Your quest is different from this Issue and should be dealt separately. Could you please open a new Issue where you specify all the details of your setup - e.g....

[REQUEST] universal checkpoint for ZeRO - 1,2,3

Any plans to work on that, Tunji? We could have really used that feature in the current 80b m4 training, as we would like to add new parameters that were...

[REQUEST] universal checkpoint for ZeRO - 1,2,3

> does it mean deepspeed already supports automatically changing world size for Zero-1,2,3 if I use bf16 No, it's not. It's currently a confusing situation as `BF16Optimizer` was written specifically...