Luka Govedič

Results 53 comments of Luka Govedič

@calvin0327 #18974 landed and #19312 is adding more cleanup - should we close this PR and add any additional improvements into #19312?

Maybe add `C10_HOST_DEVICE` back in, like the original constant?

@jeffdaily it looks like you missed an import in `vllm.model_executor.layers.quantization.utils.fp8_utils`: ``` ImportError: cannot import name 'current_platform_fp8_dtype' from 'vllm.model_executor.layers.quantization.utils.fp8_utils' (/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py) ``` Could you get that fixed, and we can enable the...

Yeah that was nice cleanup! @robertgshaw2-redhat @mgoin could you guys help with the fp8 support on other platforms?

@jeffdaily I am planning to look into it more but do you know if this is a ROCm version issue? This PR is breaking the build on `main` ``` /mnt/nvme3n1p1/sage/git/nm-vllm/csrc/quantization/fp8/amd/quant_utils.cuh:25:33:...

Could you help with the fix? I assume there's an alternative for ROCm 6.2? It's not the CI, it's a bug in a local build

I think forward fixing is fine, the revert wouldn't be immediate either anyway. Unless you think the fix will be complicated?

Okay ping me on vLLM Slack once done so we can merge ASAP

Btw, Pytorch [updated the auto-functionalization](https://dev-discuss.pytorch.org/t/a-new-strategy-for-automatic-custom-operators-functionalization/2733) which I think will break our custom fusion passes. So we should disable it, there's an inductor config field called `enable_auto_functionalized_v2`. @mgoin do you want...