zfy3000
zfy3000
Does it support nccl, IB and TCP networks at the same time? Thanks very much!
> Thanks for your interests! [@YJHMITWEB](https://github.com/YJHMITWEB) How do you run the test for multi-node? via the launch script we provided or just torchrun? If you check the launch.sh under script...
Does nvshmem support multi-machine p2p? Thanks! @wenlei-bao
>I have read articles about Flux and noticed that the paper mentions a TP+SP approach in Transformer, not pure TP. To confirm: During the decoding phase of the inference stage,...
@wenlei-bao Thank you for your reply. May I ask if cutlass supports automatic padding and filling of unaligned matrices? I see that pytorch can run normally in the above case....
code path: include\flux\cuda\gemm_impls\gemm_grouped_impl.hpp // Parse template parameters static constexpr auto dt_conf = to_gemm_dtype_config(make_gemm_dtype_config(meta.dtype())); using ElementA = decltype(to_cutlass_element(dt_conf.a())); using ElementB = decltype(to_cutlass_element(dt_conf.b())); using ElementC = decltype(to_cutlass_element(dt_conf.c())); using ElementD = decltype(to_cutlass_element(dt_conf.d())); using...