Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

[proposal] drop assets ID

https://github.com/huggingface/blog/pull/538

Slower inference results for BLOOM fp16 on identical hardware

My tests were run on JeanZay HPC so it's possible their servers are somehow beefier hardware-wise? It is interesting that you both report the same speed with int8. @RezaYazdaniAminabadi, do...

Slower inference results for BLOOM fp16 on identical hardware

Unfortunately I no longer have access to JeanZay so I can't retrieve any more data at the moment. > Could this be due to slow communication between GPUs? That's very...

grad norm increase strangely

the initial topology conversion was written for BF16Optimizer, but here you use zero stage=1, which I haven't worked with, so I have no experience with this use-case. Tagging @tjruwase who...

grad norm increase strangely

Honestly I'm not sure as I wasn't part of the data team. I remember they said that most likely the normal tokenizer should work, but it might be safer to...

grad norm increase strangely

> Could you please provide more details about the training of 1B7 or 3B or 7B1 models? I only worked on 176B so I'm not the right person to ask....

grad norm increase strangely

a small correction: that's not apex, but Deepspeed's top level optimizer doing the skipping.

About reshape deepspeed checkpoint

Please see: https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#checkpoint-reshaping I know it's not generic at the moment, please let me know if you run into any difficulties following those instructions while adapting to your situation. @tjruwase,...

About reshape deepspeed checkpoint

only 176B was trained on A100 and thus bf16 (z?), everything else was trained on V100s, thus fp16, thus z1

About reshape deepspeed checkpoint

I wasn't part of this training, @TevenLeScao do you by chance know who did the bloom-3b training? and if it's possible to update to the deepspeed@master in the conda env,...