st-moe-pytorch icon indicating copy to clipboard operation
st-moe-pytorch copied to clipboard

About gating_top_n

Open Heihaierr opened this issue 1 year ago • 5 comments

Hi, I notice there is experiment with top_n=1 in the paper of st-moe. But in st_moe_pytorch.py, assert top_n >= 2, 'must be 2 or more experts' Can top_n=1 work in this implementation?

Heihaierr avatar Dec 01 '23 08:12 Heihaierr

@Heihaierr not here, as it was just to be faithful to the paper, which explored 2 and then a generalization of top-n (iirc) up to 3 and 4

i thought that top 1 didn't work that well?

lucidrains avatar Dec 01 '23 17:12 lucidrains

@Heihaierr not here, as it was just to be faithful to the paper, which explored 2 and then a generalization of top-n (iirc) up to 3 and 4

i thought that top 1 didn't work that well?

Yes, but the paper also explored top-1 routing and shows improvement.

Heihaierr avatar Dec 04 '23 02:12 Heihaierr

ah I see, yeah they did, but 2 is still recommended

Screenshot_20231203-183548_Adobe Acrobat

lucidrains avatar Dec 04 '23 02:12 lucidrains

get it, thanks for quick reply

Heihaierr avatar Dec 04 '23 02:12 Heihaierr

If top_n=1, how should we achieve it?

moon4869 avatar Jan 11 '24 07:01 moon4869