worker-vllm icon indicating copy to clipboard operation
worker-vllm copied to clipboard

How to call the API with the RunPod /run format?

Open AlexisMDP opened this issue 4 months ago • 0 comments

I have a vLLM Serverless Endpoint with Qwen2.5-VL-3B-Instruct. I call it with openAI format API calls:

curl https://api.runpod.ai/v2/5abxxxxxxsk5a/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer rpa_2BFKI96Y02Qxxxxxxxxxxu6u4i" \
  -d '{
    "model": "/models/huggingface-cache/hub/models--Qwen--Qwen2.5-VL-3B-Instruct-AWQ/snapshots/e7b623934290c5a4da0ee3c6e1e57bfb6b5abbf2"",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://site.com/image.jpg"
            }
          }
        ]
      },
    ],
    "max_tokens": 1000,
    "temperature": 0.7
  }'

But i would like to make the calls with the RunPod async format (/run). Is it possible? And how? I can't find the proper format...

AlexisMDP avatar Aug 14 '25 14:08 AlexisMDP