benchmark
benchmark copied to clipboard
Wrap up the distributed benchmark and get them running on conda_mast
We would like to introduce basic distributed benchmarking support on synthetic data.
The idea is to wrap up single-GPU model on DDP/FSDP, then get them running on conda_mast.
The initial OSS distributed
userbenchmark could be used as the starting point:
https://github.com/pytorch/benchmark/tree/main/userbenchmark/distributed