st-moe-pytorch
st-moe-pytorch copied to clipboard
About gating_top_n
Hi, I notice there is experiment with top_n=1
in the paper of st-moe
. But in st_moe_pytorch.py
,
assert top_n >= 2, 'must be 2 or more experts'
Can top_n=1
work in this implementation?