flux icon indicating copy to clipboard operation
flux copied to clipboard

[QUESTION]你好,flux/test/python/moe_ag_scatter/test_moe_ag.py这个需要在8gpu的时候设置tp size=2需要怎么改?

Open jinchen89 opened this issue 8 months ago • 4 comments

我看flux/test/python/moe_gather_rs/test_moe_gather_rs.py里面可以随意设置tp size?我理解tp size可以和gpu数量不相等的,请不吝赐教,感谢!

jinchen89 avatar Apr 19 '25 08:04 jinchen89

In test_moe_ag.py, the tp size you want is actually ffn_tp_size

ZSL98 avatar Apr 23 '25 11:04 ZSL98

所以TP_GROUP.size()指的是物理gpu的数量?

jinchen89 avatar Apr 24 '25 02:04 jinchen89

In the context of sequence parallelism, the input is partitioned into shards, with the number of shards being equal to TP_GROUP.size(), as is the situation in our example. In this scenario, the TP_GROUP.size() is equivalent to the number of devices.

ZSL98 avatar Apr 24 '25 09:04 ZSL98

Just change the code in launch.sh nproc_per_node=2

LevinSx avatar Aug 14 '25 11:08 LevinSx