MoE-Infinity icon indicating copy to clipboard operation
MoE-Infinity copied to clipboard

PyTorch library for cost-effective, fast and easy serving of MoE models.

Results 24 MoE-Infinity issues
Sort by recently updated
recently updated
newest added

### Prerequisites - [x] I have read the [MoE-Infinity documentation](). - [x] I have searched the [Issue Tracker](https://github.com/EfficientMoE/MoE-Infinity/issues) to ensure this hasn't been reported before. ### System Information Running on...

bug

### Prerequisites - [x] I have searched existing issues and reviewed documentation. ### Problem Description May I ask what parallel techniques you have implemented? When setting CUDA_VISIBLE_DEVICES=0,1,2,3, all four cards...

enhancement

### Prerequisites - [x] I have searched existing issues and reviewed documentation. ### Problem Description I want to measure the DeepSeek-v2-Lite-Chat throughput of MoE-infinity using RTX 4080 Super(16GB).The code I...

enhancement

I want to inference other DeepSeek models in V100 GPU.Does it support?Such as deepseek-ai's DeepSeek-R1-Distill-Llama-70B or DeepSeek-R1-Distill-Qwen-32B?

### Prerequisites - [x] I have searched existing issues and reviewed documentation. ### Problem Description Does the current code framework support DeepSeek V3? I found DeepSeek V3 model files in...

enhancement

Hi!I'm currently running MoE-Infinity with Mixtral-8×7B-Instruct-v0.1-offloading-demo(the quantized version) on MMLU.I encountered a failure when loading the model weights, and I’d like to know whether the MoE-Infinity algorithm is compatible with...

Bumps [pyarrow](https://github.com/apache/arrow) from 12.0.0 to 14.0.1. Commits ba53748 MINOR: [Release] Update versions for 14.0.1 529f376 MINOR: [Release] Update .deb/.rpm changelogs for 14.0.1 b84bbca MINOR: [Release] Update CHANGELOG.md for 14.0.1 f141709...

dependencies
python

Thank you for your work. May I ask what the differences are between the open-source code on GitHub and the version described in the paper? I tested **deepseek-chat-lite** in bigbench,...

### Prerequisites - [x] I have read the [MoE-Infinity documentation](). - [x] I have searched the [Issue Tracker](https://github.com/EfficientMoE/MoE-Infinity/issues) to ensure this hasn't been reported before. ### System Information GPU: NVIDIA...

bug

Thanks for your work. I have read through the code, but find nowhere `prefetch_experts` is called. This function only appears in *comments* under `model` directory, where the logic now is...