Stas Bekman

Results 664 comments of Stas Bekman

> > I looked again through your paper - please correct me if I'm wrong, but it looks like we need at least 3GB/s NVMECPU bandwidth per GPU, so really...

With the updated benchmark I get slightly worse results than before for write (was 2.59), and it has now switched to `single_submit = true` as the best ``` python parse_aio_stats.py...

@tjruwase, what do you think about putting the `400MB` column last or removing it completely - since it's always the same number it doesn't tell anything to the user? Then...

> @stas00, regarding the changing best configs and results, I think since the perf differences are so small I would put it down to noise. Also, as you notice the...

oh and we probably should have the instruction `sudo ./run_read_sweep.sh input.file read-logs` so the `sudo` prompt doesn't come as a strange surprise after the script started.

OK, so adding invalidating caching for write, had a negligible impact of 1e-2 difference. Hence, it's probably safe to not need it (as it slows the overall run time as...

> 3\. I agree that reducing the search space is critical, as @stas00 already noted. However, your results which show `sequential > overlap` deviates from our observations, making it harder...

I added "4. Contribute your data" instructions to the OP - let's see if we can get some contributions. I made a call to community inviting to contribute: https://discuss.huggingface.co/t/deepspeed-zero-infinity-looking-for-nvme-device-benchmarks/5787

Thank you for wanting to help us to gather the data, @thefazzer! I have the same card, it works without problems if you have the right torch/cuda setup. Let's not...

@tjruwase, could we please change the benchmark to bail out if it can't run with the error message, otherwise it dumps the error into the benchmark files. and it will...