tutel icon indicating copy to clipboard operation
tutel copied to clipboard

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Results 38 tutel issues
Sort by recently updated
recently updated
newest added

**Describe the bug** Hi, Authors. My code seems to hang when skip_remainder_batch=False. **To Reproduce** Steps to reproduce the behavior: ``` git clone https://github.com/microsoft/tutel --branch main python3 -m pip uninstall tutel...

application patch

Hi, thanks for this awesome project! I build my transformer model based on the MoeMlp layer. I use ema for better performance. However, when I trying to init my ema...

enhancement

Hi Thanks for providing such a wonderful codebase. I have seen and used the save & load in MoE on multiple GPUs, now I can save them on different ranks....

duplicate

hi thanks for providing such a wonderful work. However, I am curious that will you consider providing pretrained MoE models (e.g. ViT on ImageNet or machine translation tasks)

question

Cannot import JIT optimized kernels. Did you forget to install Custom Kernel Extension?

environmental issue

Hi I had the errors when using `load_importance_loss` (the code works fine when using `gshard_loss`). Does anyone have an idea about it? The error log (in one rank/node) is in...

enhancement

how can I install this pack on conda environment?? It jumped ERROR: Microsoft Visual C++ 14.0 or greater is required. but I had tied: " conda install libpython m2w64-toolchain -c...

setup

The ddp in pytorch can not distinguish experts and other shared parameters. And experts may be updated with shared gradient. The TutelDistributedOptimizer seems to be an implementation of zero, which...

question

Hi, thank you for your excellent package. I wonder whether tutel could be used seamless together with the automatic mixed precision package of PyTorch. If so, could you provide some...

question