tutel issues

My code seems to hang when skip_remainder_batch=False.

7

**Describe the bug** Hi, Authors. My code seems to hang when skip_remainder_batch=False. **To Reproduce** Steps to reproduce the behavior: ``` git clone https://github.com/microsoft/tutel --branch main python3 -m pip uninstall tutel...

Fragile-azalea

application patch

Error when doing deepcopy of the model

5

Hi, thanks for this awesome project! I build my transformer model based on the MoeMlp layer. I use ema for better performance. However, when I trying to init my ema...

yzxing87

enhancement

update TutelDistributedOptimizer

zeliu98

Example on saving experts to one model when using distributed training

2

Hi Thanks for providing such a wonderful codebase. I have seen and used the save & load in MoE on multiple GPUs, now I can save them on different ranks....

Luodian

duplicate

Pretrained MoE model

2

hi thanks for providing such a wonderful work. However, I am curious that will you consider providing pretrained MoE models (e.g. ViT on ImageNet or machine translation tasks)

Luodian

question

Cannot import JIT optimized kernels. Did you forget to install Custom Kernel Extension?

1

Cannot import JIT optimized kernels. Did you forget to install Custom Kernel Extension?

Alex-Songs

environmental issue

Error in load_importance_loss

7

Hi I had the errors when using `load_importance_loss` (the code works fine when using `gshard_loss`). Does anyone have an idea about it? The error log (in one rank/node) is in...

Luodian

enhancement

how can I install this pack on conda environment??

11

how can I install this pack on conda environment?? It jumped ERROR: Microsoft Visual C++ 14.0 or greater is required. but I had tied: " conda install libpython m2w64-toolchain -c...

Lurnco

setup

bp of shared parameters and experts

7

The ddp in pytorch can not distinguish experts and other shared parameters. And experts may be updated with shared gradient. The TutelDistributedOptimizer seems to be an implementation of zero, which...

a157801

question

Tutel with pytorch automatic mixed precision package

2

Hi, thank you for your excellent package. I wonder whether tutel could be used seamless together with the automatic mixed precision package of PyTorch. If so, could you provide some...

MiZhenxing

question

tutel
tutel copied to clipboard

Metadata

My code seems to hang when skip_remainder_batch=False.

Error when doing deepcopy of the model

update TutelDistributedOptimizer

Example on saving experts to one model when using distributed training

Pretrained MoE model

Cannot import JIT optimized kernels. Did you forget to install Custom Kernel Extension?

Error in load_importance_loss

how can I install this pack on conda environment??

bp of shared parameters and experts

Tutel with pytorch automatic mixed precision package

← Metadata

Owner

Metadata

tutel tutel copied to clipboard

Metadata

← Metadata

Owner

Metadata

tutel
tutel copied to clipboard