MoE-Infinity
MoE-Infinity copied to clipboard
PyTorch library for cost-effective, fast and easy serving of MoE models.
### Prerequisites - [x] I have read the [MoE-Infinity documentation](). - [x] I have searched the [Issue Tracker](https://github.com/EfficientMoE/MoE-Infinity/issues) to ensure this hasn't been reported before. ### System Information Running on...
### Prerequisites - [x] I have searched existing issues and reviewed documentation. ### Problem Description May I ask what parallel techniques you have implemented? When setting CUDA_VISIBLE_DEVICES=0,1,2,3, all four cards...
### Prerequisites - [x] I have searched existing issues and reviewed documentation. ### Problem Description I want to measure the DeepSeek-v2-Lite-Chat throughput of MoE-infinity using RTX 4080 Super(16GB).The code I...
I want to inference other DeepSeek models in V100 GPU.Does it support?Such as deepseek-ai's DeepSeek-R1-Distill-Llama-70B or DeepSeek-R1-Distill-Qwen-32B?
### Prerequisites - [x] I have searched existing issues and reviewed documentation. ### Problem Description Does the current code framework support DeepSeek V3? I found DeepSeek V3 model files in...
Hi!I'm currently running MoE-Infinity with Mixtral-8×7B-Instruct-v0.1-offloading-demo(the quantized version) on MMLU.I encountered a failure when loading the model weights, and I’d like to know whether the MoE-Infinity algorithm is compatible with...
Bumps [pyarrow](https://github.com/apache/arrow) from 12.0.0 to 14.0.1. Commits ba53748 MINOR: [Release] Update versions for 14.0.1 529f376 MINOR: [Release] Update .deb/.rpm changelogs for 14.0.1 b84bbca MINOR: [Release] Update CHANGELOG.md for 14.0.1 f141709...
Thank you for your work. May I ask what the differences are between the open-source code on GitHub and the version described in the paper? I tested **deepseek-chat-lite** in bigbench,...
### Prerequisites - [x] I have read the [MoE-Infinity documentation](). - [x] I have searched the [Issue Tracker](https://github.com/EfficientMoE/MoE-Infinity/issues) to ensure this hasn't been reported before. ### System Information GPU: NVIDIA...
Thanks for your work. I have read through the code, but find nowhere `prefetch_experts` is called. This function only appears in *comments* under `model` directory, where the logic now is...