Spmd whether expert parallelism is supported?
torchxla spmd whether expert parallelism is supported? If it is a moe model, how should it be computed in xla?
❓ Questions and Help
We are actually actively working on a MOE distributed training example, maybe. @alanwaketan can share more details.
Yea, will let you know once we have more information.
Yea, will let you know once we have more information.
@alanwaketan Can you tell me a little bit about your thinking? I want to express the experts in parallel in spmd, and then add custom calls to solve the routing problem of variable length tokens
@alanwaketan Do you have any updates on this issue?
Any updates on this issue? We're trying to use torch-xla SPMD mode to run inference on gpt-oss and any help (whether that's example code, or just documentation) would be greatly appreciated.