Spmd whether expert parallelism is supported？

Open mars1248 opened this issue 1 year ago • 5 comments

torchxla spmd whether expert parallelism is supported？ If it is a moe model, how should it be computed in xla？

❓ Questions and Help

May 13 '24 03:05 mars1248

We are actually actively working on a MOE distributed training example, maybe. @alanwaketan can share more details.

May 13 '24 17:05 JackCaoG

Yea, will let you know once we have more information.

May 13 '24 19:05 alanwaketan

Yea, will let you know once we have more information.

@alanwaketan Can you tell me a little bit about your thinking? I want to express the experts in parallel in spmd, and then add custom calls to solve the routing problem of variable length tokens

May 14 '24 01:05 mars1248

@alanwaketan Do you have any updates on this issue?

Apr 17 '25 13:04 ysiraichi

Any updates on this issue? We're trying to use torch-xla SPMD mode to run inference on gpt-oss and any help (whether that's example code, or just documentation) would be greatly appreciated.

Sep 03 '25 20:09 hshahTT