agents icon indicating copy to clipboard operation
agents copied to clipboard

Feature Request: Support for tool calling in llm adapters

Open jezell opened this issue 9 months ago • 7 comments

Language models use JSON Schema, MCP uses JSON Schema, OpenAPI uses JSON Schema, but livekit uses python functions. This creates a mismatch between the way tools actually interact with LLMs and the livekit API that makes it challenging to support things like MCP, OpenAPI, etc. without mapping the schema to a python function and back again as is done in the MCP sample here:

https://github.com/livekit-examples/basic-mcp

While this sample attempts to get the job done, it's full of all sorts of complicated code that shouldn't need to be written in the first place. It only exists because of the constraint that tools must be python functions, which is not an LLM native requirement, it's a LiveKit imposed requirement.

With the release of the Responses API, OpenAI has also added an alternative tool calling model for built in tools, and it's not clear how to plug those tools into a Livekit voice agent due to the way tool calling is implemented.

In a perfect world, the tool calling process would be something overridable / customizable by the llm adapter itself or a tool calling adapter, so someone could create an adapter that better integrates with native LLM capabilities or standards like MCP, instead of having to manage a lossy conversion to python functions and back again. At the moment, tool calling is implemented in a central set of functions that the llm adapter is not even involved in, making it very hard to customize tool calling to use functionality LLMs already support without forking the entire SDK:

https://github.com/livekit/agents/blob/4c3d980d287b93d1fb4417f35098e33f178d2128/livekit-agents/livekit/agents/voice/agent_activity.py#L1315

jezell avatar Apr 10 '25 22:04 jezell

I filed an issue yesterday https://github.com/livekit/agents/issues/1955 that is a manifestation of tools being represented by python functions in Livekit. I'm in favor of any way that allows us to specify the full breadth of JSON schema.

yuyuma avatar Apr 10 '25 23:04 yuyuma

We have a pretty large set of tools we've built and have been using a pydantic model to define the arguments. It's worked very well - you get built in validation, field descriptions without parsing docstrings, etc. It may not be the perfect data model for defining functions but I think it's definitely a closer representation of JSON schema than a plain old python function.

For what it's worth most of the other frameworks (OpenAI agents, pydantic AI, langchain, etc) all use the python function model but I haven't been able to track down "why".

mnbbrown avatar Apr 16 '25 09:04 mnbbrown

The interesting case is https://github.com/livekit/agents/issues/1955#issuecomment-2795672931 and https://github.com/livekit/agents/blob/86017364d3cd50f08397311d66fc47788279820b/livekit-agents/livekit/agents/llm/utils.py#L172-L174 shows that internally it's transformed to a pydantic model anyway in at least for OpenAI and Anthropic LLMs.

Proposing a couple of different directions:

  1. Update FunctionTool so it maintains an internal pydantic representation of the function. see existing here
  2. Maybe add another decorator fn that builds FunctionTool from a different signature (something like ctx: ToolContext, args: PydanticModel)

This is similar to what vercel do but with Zod instead of pydantic, and they also support optionally manually defining the json schema.

Very happy to put together a PR

mnbbrown avatar Apr 16 '25 09:04 mnbbrown

@theomonnom saw your PR 🙌
Any chance raw function tools could support context - similar to normal function tools? Happy to put up a quick PR iterating on https://github.com/elyosenergy/agents/blob/c839ac2fcbdf1b80f159095e4ea22f676aa1f05e/livekit-agents/livekit/agents/voice/generation.py#L341-L350

mnbbrown avatar Apr 22 '25 10:04 mnbbrown

Couldn't help myself https://github.com/livekit/agents/pull/2073

mnbbrown avatar Apr 22 '25 11:04 mnbbrown

@theomonnom saw your PR 🙌 Any chance raw function tools could support context - similar to normal function tools? Happy to put up a quick PR iterating on https://github.com/elyosenergy/agents/blob/c839ac2fcbdf1b80f159095e4ea22f676aa1f05e/livekit-agents/livekit/agents/voice/generation.py#L341-L350

Thanks, I moved your PR here

theomonnom avatar Apr 22 '25 12:04 theomonnom

@mnbbrown Yeah, pydantic is pretty decent. Not as good as having direct access to the lower levels for a lot of cases, but it's really hard to beat for the higher level interface.

jezell avatar Apr 25 '25 04:04 jezell

I just started implementing some tool calling and I notice this pattern seems to work well so far:

class MyModel(BaseModel):
    field_1: list[str] = Field(description='...')
    field_2: SomeEnum = Field(description='...')

class MyAgent(Agent):
    @function_tool
    def my_tool(self, context: RunContext, special_object: MyModel): ... 

Any downsides or gotchas I should be aware of about doing it this way, as opposed to specifying individual fields as tool function arguments?

levity avatar Jul 17 '25 17:07 levity

There is a maximum depth on OpenAI schemas of 5 layers, so something to keep in mind if you start doing anything with deeper nesting.

jezell avatar Jul 22 '25 19:07 jezell