worker-vllm Support for tools / tool_choice="auto" in OpenAI-compatible API

The purpose of this issue is to provide full support for tools and tool_choice="auto" in worker-vllm.

ToDo

[ ] vLLM only supports tool_choice="some_tool_name" and tool_choice="none", but hopefully soon also tool_choice="auto", see https://github.com/vllm-project/vllm/pull/5649
[x] Update vLLM from 0.4.2 to 0.5.x to get basic tools support, see #82

Why do you want this?

This will make it finally possible to use vLLM to build AI Agents.

What have you tried instead?

I ran into these errors in the logs on RunPod while testing tools using worker-vllm:

LangChain

{
  "delayTime": 1135,
  "executionTime": 445,
  "id": "sync-123-123-123-123-123-123-123",
  "output": [
    {
      "code": 400,
      "message": "1 validation error for ChatCompletionRequest\nfunctions\n  Extra inputs are not permitted [type=extra_forbidden, input_value=[{'description': 'Get the...n'], 'type': 'object'}}], input_type=list]\n    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden",
      "object": "error",
      "param": null,
      "type": "BadRequestError"
    }
  ],
  "status": "COMPLETED"
}

Vercel AI SDK

{
  "delayTime": 1106,
  "executionTime": 401,
  "id": "97c5be32-123-123-123-123-123",
  "output": [
    {
      "code": 400,
      "message": "2 validation errors for ChatCompletionRequest\ntool_choice\n  Extra inputs are not permitted [type=extra_forbidden, input_value='auto', input_type=str]\n    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden\ntools\n  Extra inputs are not permitted [type=extra_forbidden, input_value=[{'function': {'descripti...}}, 'type': 'function'}], input_type=list]\n    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden",
      "object": "error",
      "param": null,
      "type": "BadRequestError"
    }
  ],
  "status": "COMPLETED"
}

Jul 19 '24 15:07 TimPietrusky

vLLM now supports named function calling in the chat completion API for models that support it. However, the tool_choice options auto and required are not yet supported but on the roadmap.

To use a named function you need to explicitly call it in the tool_choice parameter. I haven't tried it yet personally.

https://github.com/vllm-project/vllm/issues/1869
https://vllm.readthedocs.io/_/downloads/en/stable/pdf/
https://github.com/dongxiaolong/open-source-function-calling/blob/main/openhermes-functions-with-vllm.ipynb

There's an open PR to add auto tool choice from a set of tools.

https://github.com/vllm-project/vllm/pull/5649

Jul 19 '24 17:07 Saran33

@Saran33 thanks for the info.

I actually tried what you have suggested by explicitly calling my function, but this also leads to errors.

TypeScript

When you set the name of the function as your tool_choice, you will get errors by TypeScript, as only none, required or auto are allowed. You can get around this by ignoring the error with a // @ts-ignore

Vercel AI SDK

When the AI SDK tries to run the code, it will immediately complain, that the passed in function name is not an allowed option for tool_choice.

You can't get around this error.

LangChain

{
  "delayTime": 1093,
  "executionTime": 142,
  "id": "sync-123-123-123-123-123-123",
  "output": [
    {
      "code": 400,
      "message": "2 validation errors for ChatCompletionRequest\ntool_choice\n  Extra inputs are not permitted [type=extra_forbidden, input_value='get_current_weather', input_type=str]\n    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden\ntools\n  Extra inputs are not permitted [type=extra_forbidden, input_value=[{'description': 'Get the...n'], 'type': 'object'}}], input_type=list]\n    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden",
      "object": "error",
      "param": null,
      "type": "BadRequestError"
    }
  ],
  "status": "COMPLETED"
}

Jul 20 '24 14:07 TimPietrusky

@Saran33 thanks for the info.

I actually tried what you have suggested by explicitly calling my function, but this also leads to errors.

TypeScript

When you set the name of the function as your tool_choice, you will get errors by TypeScript, as only none, required or auto are allowed. You can get around this by ignoring the error with a // @ts-ignore

Vercel AI SDK

When the AI SDK tries to run the code, it will immediately complain, that the passed in function name is not an allowed option for tool_choice.

You can't get around this error.

LangChain
{
  "delayTime": 1093,
  "executionTime": 142,
  "id": "sync-123-123-123-123-123-123",
  "output": [
    {
      "code": 400,
      "message": "2 validation errors for ChatCompletionRequest\ntool_choice\n  Extra inputs are not permitted [type=extra_forbidden, input_value='get_current_weather', input_type=str]\n    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden\ntools\n  Extra inputs are not permitted [type=extra_forbidden, input_value=[{'description': 'Get the...n'], 'type': 'object'}}], input_type=list]\n    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden",
      "object": "error",
      "param": null,
      "type": "BadRequestError"
    }
  ],
  "status": "COMPLETED"
}

Are you running into this on the latest release of vLLM, or from a specific branch/PR?

Jul 21 '24 20:07 K-Mistele

@K-Mistele I'm using the image that comes from this repository: runpod/worker-vllm:stable-cuda12.1.0. According to the commit of the submodule this image is on vLLM 0.4.2.

I guess my original post was not clear: What I want is, that this worker provides this feature and right now, it doesn't. I'm aware that I could use vLLM directly, but I want to run it on RunPod with the official image.

Jul 22 '24 14:07 TimPietrusky

@TimPietrusky ah yes, looking at the release there it was only introduced in vLLM 0.5.0. +1 on bumping the vLLM version to support this feature. To the best of my knowledge, other services like NVIDIA NIMS don't support tool calling either when testing their endpoints with the example from the openAI docs, so it would definitely make Runpod even more competitive. It looks like there are a few open issues related to bumping vLLM already https://github.com/runpod-workers/worker-vllm/issues/80 https://github.com/runpod-workers/worker-vllm/issues/83 https://github.com/runpod-workers/worker-vllm/issues/84

Jul 22 '24 14:07 Saran33

@Saran33 yeah, that would be great!

Jul 22 '24 15:07 TimPietrusky

I've been trying to use langgraph ChatOpenAI to allow bind tools on vLLM using OpenAI API (Meta-Llama-3-70B-Instruct).

I've modified the extra argument on ConfigDict to "allow" in the OpenAIBAseModel class. As says here

# from vllm/entrypoints/openai/protocol.py

class OpenAIBaseModel(BaseModel):
    # OpenAI API does not allow extra fields
    model_config = ConfigDict(extra="allow") #Instead of extra="forbid"

With this modification, I've avoided the error of forbidden extra inputs and model responds. However, tools are never called.

Jul 23 '24 22:07 eduardozamudio

@eduardozamudio thanks for trying this! And super sad, that it's not working :/

@pandyamarut what do you need to get this update going? How can we provide support?

Jul 26 '24 21:07 TimPietrusky

Yeah so auto tool choice doesn't work in vLLM yet - this is documented. I am working on adding it here, but at this time tools will only work if you specify the tool to call with tool_choice since it can use outlines to force it. tool_choice="auto" and tool_choice=None are not supported - only tool_choice="some_tool_name" and tool_choice="none"

Jul 26 '24 21:07 K-Mistele

@K-Mistele awesome, THANK YOU VERY MUCH!!!

Jul 26 '24 22:07 TimPietrusky

@TimPietrusky- Really appreciate your help. You will hear super soon. Things have changed a bit, working on it.

Jul 26 '24 23:07 pandyamarut

@pandyamarut you are super welcome! I'm very excited to get this done, as we then can create AI agents on RunPod 😍

(I also updated the issue again to make it more clear what is happening here and why.)

Jul 27 '24 10:07 TimPietrusky

@TimPietrusky #82 is updated. I am currently doing sanity before it's released fully. In case, If you want to try it out. You can build an image and deploy a serverless worker, any issues & feedback you have, please let me know.

Jul 27 '24 23:07 pandyamarut

@pandyamarut thank you very much! Will test it tomorrow 🙏

Jul 28 '24 08:07 TimPietrusky

@TimPietrusky #82 is updated. I am currently doing sanity before it's released fully. In case, If you want to try it out. You can build an image and deploy a serverless worker, any issues & feedback you have, please let me know.

Hey actually i tried building my serverless image for this vllm-worker, had some complications while building ( because its big, etc ), but got it pushed. and after i deployed it to a serverless endpoint i couldn't see the openai compatible url why is that? i also noticed its like for only specific tag from runpod/vllm-* (wildcard)

This is also why i still wait for vllm-worker update, great to have you focused on this repo😊

Jul 30 '24 01:07 nerdylive123

Yeah that's specific to vllm-*. But you should still be able to use /v1/openai - to test.

For faster updates there were some changes made which made image heavier, will look at decreasing the as well.

Jul 30 '24 01:07 pandyamarut

oh wow, okay thanks for the info~! didn't know that was possible

Aug 01 '24 05:08 nerdylive123

@K-Mistele thank you so much for putting so much time and energy in resolving all the things in your PR!!! We are so much looking forward to getting this released in vLLM, you can't even imagine.

Aug 06 '24 14:08 TimPietrusky

Thanks 😁 Really looking forward to pushing this out

Aug 06 '24 15:08 K-Mistele

Thanks to @pandyamarut we have an updated vLLM to 0.5.3, which means we can actually use tools via https://github.com/runpod-workers/worker-vllm/releases/tag/v1.1.

Will make sure to test this!

Aug 06 '24 18:08 TimPietrusky

The PR was finally merged, looking forward to a new release.

Thank you @K-Mistele for putting SO MUCH WORK into this!!!

Sep 05 '24 11:09 TimPietruskyRunPod

@TimPietruskyRunPod no problem :) If you’d like I’m happy to keep you posted on future enhancements to this feature - I have already begun work on adding support for llama 3.1’s tool format

Sep 05 '24 11:09 K-Mistele

@K-Mistele oh yes please, that would be awesome!

(still getting used to having two accounts :D)

Sep 05 '24 13:09 TimPietrusky

Is the issue solved?

Feb 18 '25 04:02 SmartManoj