Michael Goin

Results 271 comments of Michael Goin

Please merge with main to fix the docker build

Looks like several of the failing tests are related to the merge 😞 ``` [2025-05-02T21:38:44Z] FAILED kernels/quantization/test_awq_marlin.py::test_fused_marlin_moe_awq[128-6-64-1024-2048-64] - RuntimeError: vllm::fused_marlin_moe() is missing value for argument 'quant_type_id'. Declaration: vllm::fused_marlin_moe(Tensor hidden_states, Tensor...

I've resolved most of the model issues with the above referenced PRs #18002 and #18017 . There is one outstanding issue that it would be useful to have you take...

FYI @dsikka @ElizaWszola as we might want to replace the marlin moe kernel with this new implementation based on benchmarking

Sounds good, sorry for leaving full CI off until now as I thought it was already on. Thanks for getting this close to landing. As a smoke test I ran...

And maybe most importantly, for the case we were using Marlin MoE before, this kernel is now the best choice for Mixtral 8x7B as well ``` # Main's Marlin MoE...

@cpfiffer why do you need to create custom logits processors? I thought outlines would use the structured output feature in vLLM directly by passing in a grammar/json schema to be...