Michael Goin

Results 271 comments of Michael Goin

There was a small issue with the SamplerOutput import that I fixed in the latest commit. After that, the model looks to be performing as expected! ``` lm_eval --model vllm...

Is there a model uploaded to HF that I can reproduce with? I would assume this issue is specific to `group_size=32`, is this accurate? I would not be surprised if...

Hi @HaiShaw thanks for pushing up this chunk of work. Is there a reason you haven't tried enabling AMD explicitly through the existing "fp8" quantization backend with the current checkpoint...

Could you run an lm-eval to confirm accuracy before ready? i.e. ``` pip install "lm-eval[api]==0.4.7" lm_eval --model vllm --model_args pretrained=nvidia/DeepSeek-R1-FP4,tensor_parallel_size=8,max_model_len=2048,gpu_memory_utilization=0.99 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto ```

I agree, it seems ready to merge. PTAL @simon-mo @DarkLight1337

Can we revive this? I would like to update flashinfer to latest now that we have it integrated with V1 as an attention backend

Hi @abmfy do you plan to have test updates soon? We can help make them if you don't have time right now

@zhyncs The goal of making this W4A8 optimization "production-ready" is exactly why I also think it is a good idea to land the first step as simply having this only...