worker-vllm
worker-vllm copied to clipboard
How to call the API with the RunPod /run format?
I have a vLLM Serverless Endpoint with Qwen2.5-VL-3B-Instruct. I call it with openAI format API calls:
curl https://api.runpod.ai/v2/5abxxxxxxsk5a/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer rpa_2BFKI96Y02Qxxxxxxxxxxu6u4i" \
-d '{
"model": "/models/huggingface-cache/hub/models--Qwen--Qwen2.5-VL-3B-Instruct-AWQ/snapshots/e7b623934290c5a4da0ee3c6e1e57bfb6b5abbf2"",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image."
},
{
"type": "image_url",
"image_url": {
"url": "https://site.com/image.jpg"
}
}
]
},
],
"max_tokens": 1000,
"temperature": 0.7
}'
But i would like to make the calls with the RunPod async format (/run).
Is it possible? And how?
I can't find the proper format...