st-moe-pytorch
st-moe-pytorch copied to clipboard
Question on the experts' input
Hi, thanks a lot for your great work!
I tried using this code on my project and I found that the input that goes to the MoE module (x
in the forward function of the MoE class) and the input that goes to the first expert (expert_input
in the Expert class) are not the same. I thought that since the first expert is always used on all tokens, the input should be the same. Is my assumption wrong?
Second, I noticed that the input dimensions are different. In the forward function of the MoE below, the input is transformed into (b, e, c, d) dimensions https://github.com/lucidrains/st-moe-pytorch/blob/6b7f7fbb93610134c902efdfe096e06fe5a7d6b5/st_moe_pytorch/st_moe_pytorch.py#L609-L613 but in the Experts class, it seems the expected dimension is (b, e, n, d). If I understand correctly, c is expert capacity, n is the sequence length, and they are not the same. Could you please also enlighten me on this?
Thank you very much for your help!