Luka Govedič comments

Results 53 comments of


                                            Luka Govedič

[V1] Optimized the `determine_available_memory` method for v1

@calvin0327 #18974 landed and #19312 is adding more cleanup - should we close this PR and add any additional improvements into #19312?

[V1] Optimized the `determine_available_memory` method for v1

Superseeded by #18974 and #19312

dynamic distpatch of fp8 kernels

Maybe add `C10_HOST_DEVICE` back in, like the original constant?

dynamic distpatch of fp8 kernels

@jeffdaily it looks like you missed an import in `vllm.model_executor.layers.quantization.utils.fp8_utils`: ``` ImportError: cannot import name 'current_platform_fp8_dtype' from 'vllm.model_executor.layers.quantization.utils.fp8_utils' (/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py) ``` Could you get that fixed, and we can enable the...

dynamic distpatch of fp8 kernels

Yeah that was nice cleanup! @robertgshaw2-redhat @mgoin could you guys help with the fp8 support on other platforms?

dynamic distpatch of fp8 kernels

@jeffdaily I am planning to look into it more but do you know if this is a ROCm version issue? This PR is breaking the build on `main` ``` /mnt/nvme3n1p1/sage/git/nm-vllm/csrc/quantization/fp8/amd/quant_utils.cuh:25:33:...

dynamic distpatch of fp8 kernels

Could you help with the fix? I assume there's an alternative for ROCm 6.2? It's not the CI, it's a bug in a local build

dynamic distpatch of fp8 kernels

I think forward fixing is fine, the revert wouldn't be immediate either anyway. Unless you think the fix will be complicated?

dynamic distpatch of fp8 kernels

Okay ping me on vLLM Slack once done so we can merge ASAP

Update to torch==2.6.0

Btw, Pytorch [updated the auto-functionalization](https://dev-discuss.pytorch.org/t/a-new-strategy-for-automatic-custom-operators-functionalization/2733) which I think will break our custom fusion passes. So we should disable it, there's an inductor config field called `enable_auto_functionalized_v2`. @mgoin do you want...