st-moe-pytorch Question on the experts' input

Question on the experts' input

Open mrqorib opened this issue 10 months ago • 0 comments

Hi, thanks a lot for your great work!

I tried using this code on my project and I found that the input that goes to the MoE module (x in the forward function of the MoE class) and the input that goes to the first expert (expert_input in the Expert class) are not the same. I thought that since the first expert is always used on all tokens, the input should be the same. Is my assumption wrong?

Second, I noticed that the input dimensions are different. In the forward function of the MoE below, the input is transformed into (b, e, c, d) dimensions https://github.com/lucidrains/st-moe-pytorch/blob/6b7f7fbb93610134c902efdfe096e06fe5a7d6b5/st_moe_pytorch/st_moe_pytorch.py#L609-L613 but in the Experts class, it seems the expected dimension is (b, e, n, d). If I understand correctly, c is expert capacity, n is the sequence length, and they are not the same. Could you please also enlighten me on this?

Thank you very much for your help!

Apr 25 '24 05:04 mrqorib

st-moe-pytorch st-moe-pytorch copied to clipboard

Question on the experts' input

st-moe-pytorch
st-moe-pytorch copied to clipboard