锅蛋钉
锅蛋钉
Hello! I am a freshman of MoE. And I am interesting in the following question: What do you think of the differences of [Tutel](https://github.com/microsoft/tutel) (or [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed), use dp+tp+ep in MoE...
hello, i have tried to use megablocks in V100 + pytorch2.4.0+cu121, but get error with "cannot support bf16". If i use megablocks in fp32, i get error "group gemm must...
Hello, i want to implement FasterMoE shadow expert base on ColossalAI-MoeHybridParallel. is it possible? how can i achieve it?
### Describe the feature Hello, are there any existing implementations of expert parallel code for the new MoE model, like qwen and deepseek?
Hello, I want to perform inference on the HuggingFace MoE model [Qwen1.5-MoE-A2.7B](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) with expert parallelism using DeepSpeed in a multi-GPU environment. However, the official tutorials are not comprehensive enough, and...
### Checklist - [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed. -...