Alexis

Results 3 issues of Alexis

I have a vLLM Serverless Endpoint with Qwen2.5-VL-3B-Instruct. I call it with openAI format API calls: ``` curl https://api.runpod.ai/v2/5abxxxxxxsk5a/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer rpa_2BFKI96Y02Qxxxxxxxxxxu6u4i" \ -d...

I've build an image with baked model weights: `docker build -t account/qwen-vllm --build-arg MODEL_NAME="Qwen/Qwen2.5-VL-3B-Instruct-AWQ" --build-arg QUANTIZATION="awq_marlin" --build-arg BASE_PATH="/models" --platform linux/amd64 .` My ENV variables on my RunPod Serverless Endpoint: ```...

My goal is to get an image description as quickly as possible. How can I speed up the inference? Did I miss anything or add extra parameters? I'm getting an...