flux
flux copied to clipboard
[QUESTION] Is it possible to use splitK kernel in AG mode to overlap comm and gemm?
- I am using flux v1.0.4 to achieve overlap of gemm and comm.In AG mode,I see flux use streamK kernel based on cutlass.
- But on my gpu,splitK kernel performs better,so i want to use splitK kernels instead of streamK kernels.
So can u tell me if it is possible to achieve overlap of comm and gemm with splitK kernels?
Yes, it's possible in theory, but not implemented.
I don't know that split-k is faster than stream-k. Can you provide some cases where split-k is faster than stream-k?