vllm [Frontend] Implement Tool Calling with `tool

[Draft]

This (Draft) PR will implement tool_choice='required' for the Tool Calling API, a requested feature for the OpenAI-compatible API server (#10526 and #13002).

Roadmap

1. Validation and Guided Decoding

[x] Extend validation to support tool_choice='required'.
[x] Implement structured decoding for tool_choice='required'.

2. Chat Completions

[x] Implement/extend non-streaming chat completions.
[x] Extend to streaming chat completions.

3. Documentation

[ ] Update documentation.

Looking for feedback on the implementation roadmap and collaborations are welcome!

Feb 18 '25 14:02 meffmadd

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Feb 18 '25 14:02 github-actions[bot]

Excited for this to get merged!

Mar 02 '25 04:03 kyle-pena-kuzco

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @meffmadd.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Mar 03 '25 17:03 mergify[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @meffmadd.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Mar 05 '25 07:03 mergify[bot]

Hi @mgoin, I finished the basic implementation last week and I am pinging you to have a look yourself or assign reviewers. Thanks! 🚀

Mar 12 '25 15:03 meffmadd

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @meffmadd.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Mar 14 '25 13:03 mergify[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @meffmadd.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Mar 24 '25 12:03 mergify[bot]

Latest merge did not include changes of #14511 for the required case. Should probably be done with new PR?

Mar 24 '25 13:03 meffmadd

@simon-mo @DarkLight1337 @robertgshaw2-redhat pinging for review/feedback.

Mar 27 '25 12:03 meffmadd

Hey guys lets get this in?

Mar 27 '25 21:03 jorodg

Hey guys lets get this in?

Second this, it’s blocking downstream development

Mar 27 '25 21:03 webcoderz

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @meffmadd.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Mar 30 '25 03:03 mergify[bot]

Waiting for merging please 🙏

Mar 31 '25 07:03 BeautyyuYanli

@NickLucche I trust your judgement, can you approve the PR if you think it's ready?

Mar 31 '25 08:03 DarkLight1337

Taking another pass

Mar 31 '25 08:03 NickLucche

Thanks for the feedback @NickLucche! I will add e2e tests ASAP!

The example still works for me locally. However, I am working on a M1 Macbook Pro at the moment, which means I am on CPU backend which in turn disables V1 engine (device type=cpu is not supported by the V1 Engine. Falling back to V0.). Then I get the log message: xgrammar does not support advanced JSON schema features like enums, patterns or numeric ranges. Falling back to use outlines instead. Can you try to add --guided-decoding-backend outlines to the serve command and/or set VLLM_USE_V1=0? Does that fix it for you? We should definitely add this to the documentation as well!

Mar 31 '25 14:03 meffmadd

So I can't use --guided-decoding-backend with V1, meaning this would only work with V0. Also, this is to be specified in the e2e test I was mentioning, as they ought to just fail on V1. This is a bit unfortunate though as all efforts are mostly focused on V1 rn, I was hoping we could land on that.

I am not too familiar with xgrammar, would you know the amount of work it would take to get this working with V1? cc @mgoin

Mar 31 '25 15:03 NickLucche

Based on the documentation, alternative decoding backends are planned in V1: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html#feature-model For this implementation to work, the guided decoding backend needs to allow JSON schema with support for anyOf to select sub-schemas for each of the functions in the input to generate the correct parameters for it.

Mar 31 '25 20:03 meffmadd

Stop being nit picky about tests or your planned v0/v1 migration and merge this already!!!!!!!

Every day that this is not merged, vllm continues to be a laughing stock for any serious agent AI applications. AG2 continues to be a laughing stock for any serious agent AI applications (they only just merged support for this on their end 4 days ago).

It's insane that this wasn't literally priority 1 of an issue for your whole team for months.

Apr 01 '25 15:04 Hellisotherpeople

Unblocked tool calling tests, we can merge this if they all pass

Apr 02 '25 08:04 DarkLight1337

New e2e test might fail as I was just pushing my local changes to switch to my dev VM where I can run/debug tests on GPU etc. Lets see...

Apr 02 '25 09:04 meffmadd

Massive thank you to everyone involved!!! This is huge and I will make a very positive post on linkedin about the new developments soon coming to vllm (I didn't expect this to get merged within hours of me complaining!)

AI lives and dies off of the often unpaid or unrewarded labor from open source devs like yourselves. Please excuse my prior frustration as I am a researcher who relies on both vllm and ag2 with stakeholders who have been waiting for local models to become useful for agents.

Apr 02 '25 17:04 Hellisotherpeople

Giant thank you all!!!

Apr 02 '25 17:04 theobjectivedad

I also got this error The provided JSON schema contains features not supported by xgrammar with Qwen2.5. Hopefully we can fix this soon

Apr 09 '25 04:04 hxt365

vllm
vllm copied to clipboard

[Frontend] Implement Tool Calling with `tool_choice='required'`

[Draft]

Roadmap

1. Validation and Guided Decoding

2. Chat Completions

3. Documentation

vllm vllm copied to clipboard

[Frontend] Implement Tool Calling with `tool_choice='required'`

[Draft]

Roadmap

1. Validation and Guided Decoding

2. Chat Completions

3. Documentation

vllm
vllm copied to clipboard