litellm [Feature]: OpenAI Responses API Support

The Feature

Parent ticket to track support for new OpenAI responses API

Based on discord, this will be supported as a separate api spec, instead of trying to translate into /chat/completions. https://discord.com/channels/1123360753068540065/1139937429588021289/1349086515665305671

Checklist

Checklist for create responses endpoint

[x] Non-streaming Async
[x] Non-streaming Sync
[x] Streaming Async
[x] Streaming Sync
[x] Non-streaming logging + cost tracking
[x] Streaming logging + cost tracking
[x] litellm.router support

Proxy Checklist

[x] post /responses OpenAI non-streaming
[x] post /responses OpenAI non-streaming - logging + cost tracking
[x] post /responses OpenAI streaming
[x] post /responses OpenAI streaming - logging + cost tracking
[ ] Get Response get https://api.openai.com/v1/responses/{response_id}
[x] Delete Response delete https://api.openai.com/v1/responses/{response_id}
[ ] List Input items get https://api.openai.com/v1/responses/{response_id}/input_items

Motivation, pitch

Make it easy to give devs llm access

Are you a ML Ops Team?

No

Twitter / LinkedIn details

No response

Mar 11 '25 18:03 krrishdholakia

Great news!

Mar 12 '25 13:03 mvrodrig

Initial PR here: https://github.com/BerriAI/litellm/pull/9155

Mar 12 '25 19:03 ishaan-jaff

would be great to get this for Azure as well given: https://azure.microsoft.com/en-us/blog/announcing-the-responses-api-and-computer-using-agent-in-azure-ai-foundry/

Mar 13 '25 16:03 taralika

Hey @taralika our account doesn't have access. Does yours?

Mar 13 '25 16:03 krrishdholakia

OpenaI responses API on LiteLLM Proxy here: https://github.com/BerriAI/litellm/pull/9183

Mar 13 '25 17:03 ishaan-jaff

@taralika - can you share how you deploy Responses API on Azure AI foundry ? I don't see it

Mar 13 '25 17:03 ishaan-jaff

yeah I don't see it either.. I missed that the article says "available in few weeks" 🤦‍♂️

Mar 13 '25 17:03 taralika

ok, will wait for support from Azure API. Will add Day-0 support for it once out

Mar 13 '25 17:03 ishaan-jaff

Responses API live here, can users on this issue thread help Beta test it (@mvrodrig @blairhudson, @jskalant)

Release: https://github.com/BerriAI/litellm/releases/tag/v1.63.8-nightly

Doc: https://docs.litellm.ai/docs/response_api

Mar 13 '25 22:03 ishaan-jaff

you guys are crazy! big props!

Mar 22 '25 14:03 yigitkonur

@ishaan-jaff it looks like Responses is available in Azure now with the 2025-03-01-preview API. We're hoping to see this added to LiteLLM soon :)

https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure

Mar 26 '25 15:03 marty-sullivan

Does it support Bedrock as well? I saw on OpenAI that this opens Computer Use, and Bedrock also adds it. So it will be a must to allow Computer Use with bedrock, for instance.

Mar 26 '25 22:03 OriNachum

@OriNachum adding support for Bedrock on this endpoint is on our roadmap too

Mar 26 '25 23:03 ishaan-jaff

it would be cool to be able to convert to chat api, to use o1 pro on apps that can only use chat

Apr 04 '25 15:04 tiagoefreitas

hi @tiagoefreitas, noted. That's a good request. We can support this. Tracking here: https://github.com/BerriAI/litellm/issues/9754

would you be willing to give us feedback on the initial integration @tiagoefreitas ?

Apr 04 '25 16:04 ishaan-jaff

hi @tiagoefreitas, noted. That's a good request. We can support this. Tracking here: https://github.com/BerriAI/litellm/issues/9754

would you be willing to give us feedback on the initial integration @tiagoefreitas ?

Yes I can test it

Apr 04 '25 17:04 tiagoefreitas

Can we add this convenient output_text property to litellm's ResponsesAPIResponse

This is available in openai-responses-python

Apr 08 '25 08:04 r-sniper

@krrishdholakia @ishaan-jaff just highlighting SamA's tweet about adding MCP support to the Responses API

Any interest in making MCP servers configured in LiteLLM also available through Responses API?

Apr 18 '25 23:04 blairhudson

cc: @ishaan-jaff - since you're working on responses api support

Apr 19 '25 01:04 krrishdholakia

You may want to rely on my package:

https://pypi.org/project/openai-responses-server/ Code:

Https://github.com/teabranch/openai-responses-server/

It bridges responses api with chat completions endpoint, by managing events state.

Apr 26 '25 16:04 OriNachum

we do the transformation between /responses API and /chat/completion already @OriNachum - https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57

Apr 26 '25 16:04 krrishdholakia

@krrishdholakia how do you handle tool use? Event management is handled differently.

Also, Responses API provides Code Interpreter, Files search, web search, MCPs (soon), computer use.

Do you support these?

Apr 26 '25 19:04 OriNachum

@OriNachum can you share an example of what you're trying to do?

Not sure I follow.

Apr 26 '25 19:04 krrishdholakia

@krrishdholakia This is what we're doing: Full implementation for https://platform.openai.com/docs/api-reference/responses

Where the AI provider just needs to support chat completions endpoint.

From the link: "Extend the model's capabilities with built-in tools for file search, web search, computer use, and more. Allow the model access to external systems and data using function calling."

Apr 27 '25 03:04 OriNachum

That's interesting, so you are writing not only a format translator for Responses->Chat, but a server that implement the actual features that OpenAI offers in Responses (search, code interpreter, file search), like the features in the old Assistants, is that right?

What do you have support yet? That seems very useful!

Apr 27 '25 22:04 RodolfoCastanheira

Confirmed!

Right now I support text and tool use in stream. (Basics) I need to validate non-stream again, but that's a simple usecase.

I work on adding files endpoint integration: the ability to upload files, index and search them semantically. I base this on RagFlow integration.

Then I will add either Code Interpreter or Web search. (No special priority between them, unless asked).

If you look or wait for something specific, you can open an issue.

Apr 28 '25 13:04 OriNachum

@RodolfoCastanheira continuing my previous reply, I plan on adding Web Search and file search in the coming days.

FileSearch: probably support for graphiti first, then option for RagFlow (they only support x86 officially, and I aim at Nvidia Jetson ARM devices as well)

Web Search: still in research. For a web solution, Tavili is a strong candidate, but I want a local solution as well. (For privacy)

Apr 30 '25 06:04 OriNachum

Greetings folks,

On a journey to get codex running and accepting local models and as you might be aware, they use responses api deeply inside codex (and refuse to patch it to allow locally running models) and I got to this issue finally and tried out the vllm backend with litellm and can confirm it's not working

An assumption I am making below which (could potentially) be wrong

  1. LiteLLM is trying to forward the /v1/responses request to the backend at https://vllm-endpoint/v1/responses
  2. The backend (vLLM) doesn't support this endpoint, so it returns a 500 error
  3. This is causing a LiteLLM error: 'NoneType' object has no attribute 'get'

I am interested in working out a solution that can use litellm but internally shuffle around and rewrite events not unlike @OriNachum 's implementation (I presume)

Or do you think this is retrofitting responses into using /v1/chat/completions and doesn't make sense to do?

I think greater vllm ecosystem's acceptance of responses could take longer because of the statefulness of it

Seeing as how this beta build has -e STORE_MODEL_IN_DB=True my guess is that litellm is not afraid to tackle this statefulness in the proxy, so if there's something I can help align a pr for transforms/shuffling of data before it gets passed to vllm and back I would like some points to get to the same, potentially if it overlaps with vllm too I can try to do PR's there too and try to have things working together

Any direction or pointers to begin looking into would be greatly appreciated

May 02 '25 16:05 theycallmeloki

@theycallmeloki why not use my server as a proxy?

Responses API is not just a tunnel and event state manager to chat completions, but also provides hosted tools support: web search, files search, computer use

And future tools: code interpreter and MCP manager

May 02 '25 16:05 OriNachum

Ah I am currently using litellm proxy already and is tricky to finetune without chat completions, (/v1/responses finetuning I wouldn't even know where to begin looking for it) the earlier half of my day went into trying to figure out a way to use this open-responses but then half way through that they wanted me to deploy an entire stack of microservices just to get a proxy that does openai response API (this is mostly due to the statefulness involved like the one I mentioned above) and I gave up after failing to get postgres up for this

I did look through your solution too but dropped off as I reached towards babytau because it seemed like it was meant more for ARM, I am completely on x86, and understand I can use this link as a potential starter kit but I am not too sure that would work or how I need to involve the parent repo here or it's deployable counterpart here

If you could try to maybe make a dockerfile that has just the step neccassary to accomplish the following things, I think not just me, but a potential many more people who come accross this will gladly use your work, steps:

I should be able to pip install openai-responses-server in a step in a dockerfile
I should be able to specify the OpenAI base url and models supported (in your example http://localhost:11434 with llama-3b should be fine, but I do need some programmatic way to configure this, potentially in the following step as a command line argument? or like codex I don't mind env variables either)
I should be able to spin up a python -m openai-responses-server 8080 which acts as a 1-1 just in time proxy for bridging disparate calls into regular /v1/chat/completion tool calls which work with the local ollama andand permeate responses back upstream

May 02 '25 17:05 theycallmeloki