Shixuan Zheng comments

Repositories
Issues
Comments

Results 2 comments of


                                            Shixuan Zheng

[QUESTION]你好，flux/test/python/moe_ag_scatter/test_moe_ag.py这个需要在8gpu的时候设置tp size=2需要怎么改？

Just change the code in launch.sh nproc_per_node=2

[QUESTION] Why use Remote TMA Load for gemm_rs sm90 implementations?

It turned out that this implementation leads to worse performance than no-fusion: ./launch.sh test/python/gemm_rs/test_gemm_rs.py 8192 12288 8192 --dtype=float16 --iters=10 torch #0: gemm 0.557 ms, comm 1.009 ms, total 1.566 ms...