Model does not exist

Open AlexisMDP opened this issue 4 months ago • 1 comments

I've build an image with baked model weights:

docker build -t account/qwen-vllm --build-arg MODEL_NAME="Qwen/Qwen2.5-VL-3B-Instruct-AWQ" --build-arg QUANTIZATION="awq_marlin" --build-arg BASE_PATH="/models" --platform linux/amd64 .

My ENV variables on my RunPod Serverless Endpoint:

MODEL_NAME=Qwen/Qwen2.5-VL-3B-Instruct-AWQ
DTYPE=float16
GPU_MEMORY_UTILIZATION=0.90
ENABLE_PREFIX_CACHING=0
QUANTIZATION=awq_marlin
LIMIT_MM_PER_PROMPT=image=1,video=0
MAX_MODEL_LEN=16384
ENFORCE_EAGER=true
TRUST_REMOTE_CODE=true
VLLM_IMAGE_FETCH_TIMEOUT=10
HF_HUB_OFFLINE=1
VLLM_USE_MODELSCOPE=0

When calling the endpoint:

curl https://api.runpod.ai/v2/vllm-3x8xxxxxxi/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer rpa_xxxx" \
  -d '{
    "model": "Qwen/Qwen2.5-VL-3B-Instruct-AWQ",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://www.site.com/image.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 100,
    "temperature": 0.7
  }'

ERROR 08-05 07:08:14 [serving_chat.py:136] Error with model object='error' message='The model Qwen/Qwen2.5-VL-3B-Instruct-AWQ does not exist.' type='NotFoundError' param=None code=404

Aug 05 '25 07:08 AlexisMDP

I think i found a solution... but i'm sure there is a better way... When using a local path for the model you must pass the full path when doing an API call: "model": "/models/huggingface-cache/hub/models--Qwen--Qwen2.5-VL-3B-Instruct-AWQ/snapshots/e7b623934290c5a4da0ee3c6e1e57bfb6b5abbf2

Aug 05 '25 14:08 AlexisMDP