zfy3000

Results 6 comments of zfy3000

Does it support nccl, IB and TCP networks at the same time? Thanks very much!

> Thanks for your interests! [@YJHMITWEB](https://github.com/YJHMITWEB) How do you run the test for multi-node? via the launch script we provided or just torchrun? If you check the launch.sh under script...

Does nvshmem support multi-machine p2p? Thanks! @wenlei-bao

>I have read articles about Flux and noticed that the paper mentions a ​​TP+SP approach in Transformer, not pure TP. To confirm: During the ​​decoding phase of the inference stage​​,...

@wenlei-bao Thank you for your reply. May I ask if cutlass supports automatic padding and filling of unaligned matrices? I see that pytorch can run normally in the above case....

code path: include\flux\cuda\gemm_impls\gemm_grouped_impl.hpp // Parse template parameters static constexpr auto dt_conf = to_gemm_dtype_config(make_gemm_dtype_config(meta.dtype())); using ElementA = decltype(to_cutlass_element(dt_conf.a())); using ElementB = decltype(to_cutlass_element(dt_conf.b())); using ElementC = decltype(to_cutlass_element(dt_conf.c())); using ElementD = decltype(to_cutlass_element(dt_conf.d())); using...