Model does not exist
I've build an image with baked model weights:
docker build -t account/qwen-vllm --build-arg MODEL_NAME="Qwen/Qwen2.5-VL-3B-Instruct-AWQ" --build-arg QUANTIZATION="awq_marlin" --build-arg BASE_PATH="/models" --platform linux/amd64 .
My ENV variables on my RunPod Serverless Endpoint:
MODEL_NAME=Qwen/Qwen2.5-VL-3B-Instruct-AWQ
DTYPE=float16
GPU_MEMORY_UTILIZATION=0.90
ENABLE_PREFIX_CACHING=0
QUANTIZATION=awq_marlin
LIMIT_MM_PER_PROMPT=image=1,video=0
MAX_MODEL_LEN=16384
ENFORCE_EAGER=true
TRUST_REMOTE_CODE=true
VLLM_IMAGE_FETCH_TIMEOUT=10
HF_HUB_OFFLINE=1
VLLM_USE_MODELSCOPE=0
When calling the endpoint:
curl https://api.runpod.ai/v2/vllm-3x8xxxxxxi/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer rpa_xxxx" \
-d '{
"model": "Qwen/Qwen2.5-VL-3B-Instruct-AWQ",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image."
},
{
"type": "image_url",
"image_url": {
"url": "https://www.site.com/image.jpg"
}
}
]
}
],
"max_tokens": 100,
"temperature": 0.7
}'
ERROR 08-05 07:08:14 [serving_chat.py:136] Error with model object='error' message='The model Qwen/Qwen2.5-VL-3B-Instruct-AWQ does not exist.' type='NotFoundError' param=None code=404
I think i found a solution... but i'm sure there is a better way...
When using a local path for the model you must pass the full path when doing an API call:
"model": "/models/huggingface-cache/hub/models--Qwen--Qwen2.5-VL-3B-Instruct-AWQ/snapshots/e7b623934290c5a4da0ee3c6e1e57bfb6b5abbf2