Salomón Mejía

Results 6 comments of Salomón Mejía

Use this tutorial to run it in a docker container. It worked for me. https://github.com/vllm-project/vllm/issues/14452

If you're looking for a smooth way to deploy vLLM with 4-bit quantization, here's a solid base setup using Docker, NVIDIA's optimized image and bitsandbytes. We're starting from this base...

I downloaded it in a pc with free internet and transfer to the server with STT. that can help

I had same issue but it is solved on https://github.com/vllm-project/vllm/issues/14452. You will need use docker to run it.

Hi @thangnguyenduc1-vti, how do you manage to integrate an 8B model into 16GB of VRAM? I understand that even if you're using a dual-system, you don't have 32GB of VRAM...