OpenMoE
OpenMoE copied to clipboard
tokens routing
thanks for your work! It is very valuable! I would like to know how you got your conclusion about token routing, since input is affected by attention and rope, it is not logical that there should be a fixed routing for each token, how should I reproduce your result about this part?