锅蛋钉

Results 12 comments of 锅蛋钉

> The main difference is what assumption to base on. > > The assumption of Tutel MoE is no assumption. e.g. Allowing switching execution approaches during runtime, without influencing designed...

> In other words, the gain from Tutel benefits general situation for sure, while the gain from FasterMoE depends on experimental factors, e.g. predictor accuracy / weight migration penalty /...

ok, thanks!I will check it.

After i change my env to A100,the issue still exists. I have no idea how to do. ```bash [rank0]: File "/usr/local/lib/python3.11/dist-packages/megablocks/layers/moe.py", line 468, in forward [rank0]: out = self.experts(x, scores,...

After I reinstall megablock from ```bash pip install megablocks[all] ``` instead of installing form source(what i have done), the issue disappears.

ok! And what is the differences of "grouped" impl and "sparse" impl? Which one gains better throughput?

But why recommend using grouped one instead of sparse one for H-GPUs? ``` Installing megablocks[gg] enables dMoE computation with grouped GEMM. This feature is enabled by setting the mlp_impl argument...

There is another issue. i haved tried dMoE fwd, which pass in my 1xA100 device but stop(not break) in my 2xA6000 device. Both are launch by ```bash torchrun --standalone --nnodes=1...

OK, i have tried it in 2xA100 device and the issue disappears. It seems that some tweaks are needed to adapt it to devices other than A100/H100.