torchtitan
torchtitan copied to clipboard
Benchmark SymmMem's all_to_all_vdev_2d on NVL72
- We need to check the functionality of all_to_all_vdev_2d on NVL72 and document anything missing.
- Benchmark if the kernel's meeting the expected NVLink bandwidth.
- Integrate into torchtitan for EP communication.
CC: @kwen2501
@syed-ahmed fyi I had a draft PR on integrating https://github.com/pytorch/torchtitan/pull/1569 I hit some issues and haven't got time to revisit since then.
Thanks @tianyu-l ! I'll try to take a look at your PR.