ghostplant
ghostplant
A dup request of #177. We are going to add some utility functions to help with this conversion.
Thanks for your information. This is a dup of #173. We'll update the fairseq patch to add `inequivalent_tokens=True` which is recently in tutel but not in fairseq patch. You may...
That's interesting. If it's true that one GPU performs 5 forwards and another GPU performs 6 forwards, does traditional data parallel even work? I think the application itself has to...
OK, this root cause makes sense. MoE layer within one process does not know application's purpose on whether other processes are going to forward MoE together with it or not....
@zeliu98
@zeliu98 We need to add assertion reason to avoid unknowns error like this. And thanks for your information! @Luodian
We have added `gate_noise` assertion and device cast in latest commit. Thanks for pointing out this bug.
It is usually due to environment issues (e.g. **improper CXX compiler, CUDA dependencies, or NCCL dependencies**) which make Tutel installation only enable CPU support, e.g. `python3 -m tutel.examples.helloworld --device cpu`....
Yes, Tutel is able to support native Pytorch AMP. Please follow this example: https://github.com/microsoft/tutel/blob/main/tutel/examples/helloworld_amp.py#L76 Note that you need to use ` @autocast()` and `with autocast:` properly according to pytorch's doc....
If you pickle the model for single GPU, anything will be fine because AllToAll is not included in Tutel's MoE layer. Is that okay to your expectation? Pytorch's NCCL operations...