flux
flux copied to clipboard
[QUESTION]some questions about allgather+gemm
I'm puzzled by several questions.
- I noticed multi ring modes in allgather-gemm, like all_to_all ring pull or push. How does ring mode effect perfmance when PCI-e?
3.Why disabled local_copy in allgather-gemm crossnode?
4.Is there any difference on perfermance between using P2P and using NVSHMEM?
5.When chunk size is not same, can flux address this case?
thank you for your generous help.
- for PCI-e machines, better use ring mode. all-to-all is for NVLink
- nop
- local_copy is not disabled?
- there should not be any difference. if you find a gap, please report.
- don't understand your question, can you make it more clear?