Salomón Mejía comments

Results 6 comments of


                                            Salomón Mejía

[Bug]: RTX50xx GPU is not supported for running W8A8 FP8 quant models!

Use this tutorial to run it in a docker container. It worked for me. https://github.com/vllm-project/vllm/issues/14452

[Bug]: I am traing to run unsloth/phi-4-bnb-4bit but I am getting always the same error Validation Error:1 validatiopn error for modelconfig Infer_schema(func): Parameter block_size has unsupported type list[int]

I am using a RTX 5080 and 5090 but both shown same error.

[Bug]: I am traing to run unsloth/phi-4-bnb-4bit but I am getting always the same error Validation Error:1 validatiopn error for modelconfig Infer_schema(func): Parameter block_size has unsupported type list[int]

If you're looking for a smooth way to deploy vLLM with 4-bit quantization, here's a solid base setup using Docker, NVIDIA's optimized image and bitsandbytes. We're starting from this base...

I can‘t find BatchedInferencePipeline.

I downloaded it in a pc with free internet and transfer to the server with STT. that can help

[Bug]: RTX5080 got CUDA error: no kernel image is available for execution on the device

I had same issue but it is solved on https://github.com/vllm-project/vllm/issues/14452. You will need use docker to run it.

[Bug]: RTX5080 got CUDA error: no kernel image is available for execution on the device

Hi @thangnguyenduc1-vti, how do you manage to integrate an 8B model into 16GB of VRAM? I understand that even if you're using a dual-system, you don't have 32GB of VRAM...