MoE-Infinity
MoE-Infinity copied to clipboard
Can the MoE-Infinity framework be used in conjunction with the vLLM framework?
Because I am using vLLM server to deploy a MoE model. However, this model has a large number of experts and the number of activated experts is very small. So it is very suitable for the expert offloading solution.
Still work in progress, curently it is not supported