Stas Bekman

Results 664 comments of Stas Bekman

can we try again please Tunji?

I just merged something on transformers a few hours ago - perhaps we broke something there - checking your CI

It looks like someone broke the `zero_to_fp32.py` file - it now starts with: ``` '''Copyright The Microsoft DeepSpeed Team''' #!/usr/bin/env python ``` should be starting with: ``` #!/usr/bin/env python ```...

This is the source of breakage https://github.com/microsoft/DeepSpeed/pull/2889 - @jeffra If I revert that commit, everything works. If you merge this https://github.com/microsoft/DeepSpeed/pull/2909 it should work

`7*(8+4)=84` - so you should have a 84GB universal checkpoint. (8+4 optim states+weights) Can you check with `du -s` where do you get the bloat? e.g. ``` du -ahd1 /path...

Thank you for the details, @LiinXemmon @tjruwase, is it possible we are hitting the tensor bloat here as well? Same as you're fixing in https://github.com/microsoft/DeepSpeed/pull/3348 @LiinXemmon, could you please confirm...

> The sweep results suggest that zero-infinity can be configured to do offloading at read rate of 3GB/sec and write rate of 2.6GB/sec. So you want to configure the [asynchronous...

I looked again through your paper - please correct me if I'm wrong, but it looks like we need at least 3GB/s NVMECPU bandwidth per GPU, so really my NVME...

This helps a lot, thank you! Can we make `parse_aio_stats.py` take in both read and write reports and generate the recommended config for the user?