mesh
mesh copied to clipboard
[MOE-transformer] How do you build static graph of MOE-Model?
Thank you for your great work, Here I'm curious about MOE-Transformer's static graph construction.
Q: When there is 1024 experts, switch gating method is used, you need to build 1024 different graph for FFN_Module? What's the detail of graph construction?
(Is it related to #40 ?)