st-moe-pytorch
st-moe-pytorch copied to clipboard
About gating_top_n
Hi, I notice there is experiment with top_n=1
in the paper of st-moe
. But in st_moe_pytorch.py
,
assert top_n >= 2, 'must be 2 or more experts'
Can top_n=1
work in this implementation?
@Heihaierr not here, as it was just to be faithful to the paper, which explored 2 and then a generalization of top-n (iirc) up to 3 and 4
i thought that top 1 didn't work that well?
@Heihaierr not here, as it was just to be faithful to the paper, which explored 2 and then a generalization of top-n (iirc) up to 3 and 4
i thought that top 1 didn't work that well?
Yes, but the paper also explored top-1 routing and shows improvement.
ah I see, yeah they did, but 2 is still recommended
get it, thanks for quick reply
If top_n=1
, how should we achieve it?