st-moe-pytorch icon indicating copy to clipboard operation
st-moe-pytorch copied to clipboard

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

Results 5 st-moe-pytorch issues
Sort by recently updated
recently updated
newest added

Hi, thanks a lot for your great work! I tried using this code on my project and I found that the input that goes to the MoE module (`x` in...

Do you know how this giant all reduce works for giant architectures across hundreds of workers? Specifically interested in this bit of code ``` if is_distributed: ... # gather and...

IIUC, the [topk](https://github.com/lucidrains/CoLT5-attention/blob/main/colt5_attention/topk.py) in colt5_attention uses [coor_descent](https://github.com/lucidrains/CoLT5-attention/blob/main/colt5_attention/coor_descent.py#L17), and, according to the original [paper](https://openreview.net/pdf?id=IyYyKov0Aj) Eq 8 - 11, it seems to expect the input to be unnormalized. However, in the forward...

First of all, thank you for your project, it looks great! I have been trying to apply it to ViT just like V-MoE. During the training process, I observed some...

Hi, I notice there is experiment with `top_n=1` in the paper of `st-moe`. But in `st_moe_pytorch.py`, `assert top_n >= 2, 'must be 2 or more experts'` Can `top_n=1` work in...