tutel
tutel copied to clipboard
Tutel MoE: An Optimized Mixture-of-Experts Implementation
**Describe the bug** Hi, Authors. My code seems to hang when skip_remainder_batch=False. **To Reproduce** Steps to reproduce the behavior: ``` git clone https://github.com/microsoft/tutel --branch main python3 -m pip uninstall tutel...
Hi, thanks for this awesome project! I build my transformer model based on the MoeMlp layer. I use ema for better performance. However, when I trying to init my ema...
Hi Thanks for providing such a wonderful codebase. I have seen and used the save & load in MoE on multiple GPUs, now I can save them on different ranks....
hi thanks for providing such a wonderful work. However, I am curious that will you consider providing pretrained MoE models (e.g. ViT on ImageNet or machine translation tasks)
Cannot import JIT optimized kernels. Did you forget to install Custom Kernel Extension?
Hi I had the errors when using `load_importance_loss` (the code works fine when using `gshard_loss`). Does anyone have an idea about it? The error log (in one rank/node) is in...
how can I install this pack on conda environment?? It jumped ERROR: Microsoft Visual C++ 14.0 or greater is required. but I had tied: " conda install libpython m2w64-toolchain -c...
The ddp in pytorch can not distinguish experts and other shared parameters. And experts may be updated with shared gradient. The TutelDistributedOptimizer seems to be an implementation of zero, which...
Hi, thank you for your excellent package. I wonder whether tutel could be used seamless together with the automatic mixed precision package of PyTorch. If so, could you provide some...