vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod

Open WoosukKwon opened this issue 1 year ago • 5 comments

Currently, UnquantizedFusedMoEMethod directly imports the Triton fused MoE kernel and related CUDA kernels, preventing other hardware backends from supporting MoE models. This PR adds the CustomOp interface to it so that the kernels are imported only for NVIDIA and AMD GPUs.

WoosukKwon avatar Jul 10 '24 06:07 WoosukKwon

Does this need to be added to the fp8 method as well? Or are we handling quantization separately?

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/fp8.py#L220

robertgshaw2-redhat avatar Jul 10 '24 12:07 robertgshaw2-redhat

@robertgshaw2-neuralmagic We haven't used the CustomOp interface for the quantization-related ops, since they usually only support NVIDIA or AMD GPUs. Do you want to apply the interface to the quant ops?

WoosukKwon avatar Jul 10 '24 17:07 WoosukKwon

@robertgshaw2-neuralmagic We haven't used the CustomOp interface for the quantization-related ops, since they usually only support NVIDIA or AMD GPUs. Do you want to apply the interface to the quant ops?

I think its okay to leave it for now and make the modifications once we have a need for it

robertgshaw2-redhat avatar Jul 10 '24 17:07 robertgshaw2-redhat

This PR seems to break Mixtral. Let me check the reason.

WoosukKwon avatar Jul 10 '24 19:07 WoosukKwon

What TP is it running at? @WoosukKwon

robertgshaw2-redhat avatar Jul 10 '24 19:07 robertgshaw2-redhat

@comaniac Could you please take a look? The PR removes a few lines of code in model loader that you marked as FIXME.

WoosukKwon avatar Jul 15 '24 16:07 WoosukKwon

@comaniac Could you please take a look? The PR removes a few lines of code in model loader that you marked as FIXME.

That FIXME should be removed safely. Please let me know if the test still fails and I'll take a look.

comaniac avatar Jul 15 '24 17:07 comaniac

@comaniac Thanks for the confirmation! It works well.

WoosukKwon avatar Jul 15 '24 18:07 WoosukKwon