[Feature]: OpenAI Responses API Support
The Feature
Parent ticket to track support for new OpenAI responses API
Based on discord, this will be supported as a separate api spec, instead of trying to translate into /chat/completions. https://discord.com/channels/1123360753068540065/1139937429588021289/1349086515665305671
Checklist
Checklist for create responses endpoint
- [x] Non-streaming Async
- [x] Non-streaming Sync
- [x] Streaming Async
- [x] Streaming Sync
- [x] Non-streaming logging + cost tracking
- [x] Streaming logging + cost tracking
- [x] litellm.router support
Proxy Checklist
- [x] post /responses OpenAI non-streaming
- [x] post /responses OpenAI non-streaming - logging + cost tracking
- [x] post /responses OpenAI streaming
- [x] post /responses OpenAI streaming - logging + cost tracking
- [ ] Get Response
get https://api.openai.com/v1/responses/{response_id} - [x] Delete Response
delete https://api.openai.com/v1/responses/{response_id} - [ ] List Input items
get https://api.openai.com/v1/responses/{response_id}/input_items
Motivation, pitch
Make it easy to give devs llm access
Are you a ML Ops Team?
No
Twitter / LinkedIn details
No response
Great news!
Initial PR here: https://github.com/BerriAI/litellm/pull/9155
would be great to get this for Azure as well given: https://azure.microsoft.com/en-us/blog/announcing-the-responses-api-and-computer-using-agent-in-azure-ai-foundry/
Hey @taralika our account doesn't have access. Does yours?
OpenaI responses API on LiteLLM Proxy here: https://github.com/BerriAI/litellm/pull/9183
@taralika - can you share how you deploy Responses API on Azure AI foundry ? I don't see it
yeah I don't see it either.. I missed that the article says "available in few weeks" 🤦♂️
ok, will wait for support from Azure API. Will add Day-0 support for it once out
Responses API live here, can users on this issue thread help Beta test it (@mvrodrig @blairhudson, @jskalant)
Release: https://github.com/BerriAI/litellm/releases/tag/v1.63.8-nightly
Doc: https://docs.litellm.ai/docs/response_api
you guys are crazy! big props!
@ishaan-jaff it looks like Responses is available in Azure now with the 2025-03-01-preview API. We're hoping to see this added to LiteLLM soon :)
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure
Does it support Bedrock as well? I saw on OpenAI that this opens Computer Use, and Bedrock also adds it. So it will be a must to allow Computer Use with bedrock, for instance.
@OriNachum adding support for Bedrock on this endpoint is on our roadmap too
it would be cool to be able to convert to chat api, to use o1 pro on apps that can only use chat
hi @tiagoefreitas, noted. That's a good request. We can support this. Tracking here: https://github.com/BerriAI/litellm/issues/9754
would you be willing to give us feedback on the initial integration @tiagoefreitas ?
hi @tiagoefreitas, noted. That's a good request. We can support this. Tracking here: https://github.com/BerriAI/litellm/issues/9754
would you be willing to give us feedback on the initial integration @tiagoefreitas ?
Yes I can test it
Can we add this convenient output_text property to litellm's ResponsesAPIResponse
This is available in openai-responses-python
@krrishdholakia @ishaan-jaff just highlighting SamA's tweet about adding MCP support to the Responses API
Any interest in making MCP servers configured in LiteLLM also available through Responses API?
cc: @ishaan-jaff - since you're working on responses api support
You may want to rely on my package:
https://pypi.org/project/openai-responses-server/ Code:
Https://github.com/teabranch/openai-responses-server/
It bridges responses api with chat completions endpoint, by managing events state.
we do the transformation between /responses API and /chat/completion already @OriNachum - https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57
@krrishdholakia how do you handle tool use? Event management is handled differently.
Also, Responses API provides Code Interpreter, Files search, web search, MCPs (soon), computer use.
Do you support these?
@OriNachum can you share an example of what you're trying to do?
Not sure I follow.
@krrishdholakia This is what we're doing: Full implementation for https://platform.openai.com/docs/api-reference/responses
Where the AI provider just needs to support chat completions endpoint.
From the link: "Extend the model's capabilities with built-in tools for file search, web search, computer use, and more. Allow the model access to external systems and data using function calling."
That's interesting, so you are writing not only a format translator for Responses->Chat, but a server that implement the actual features that OpenAI offers in Responses (search, code interpreter, file search), like the features in the old Assistants, is that right?
What do you have support yet? That seems very useful!
Confirmed!
Right now I support text and tool use in stream. (Basics) I need to validate non-stream again, but that's a simple usecase.
I work on adding files endpoint integration: the ability to upload files, index and search them semantically. I base this on RagFlow integration.
Then I will add either Code Interpreter or Web search. (No special priority between them, unless asked).
If you look or wait for something specific, you can open an issue.
@RodolfoCastanheira continuing my previous reply, I plan on adding Web Search and file search in the coming days.
FileSearch: probably support for graphiti first, then option for RagFlow (they only support x86 officially, and I aim at Nvidia Jetson ARM devices as well)
Web Search: still in research. For a web solution, Tavili is a strong candidate, but I want a local solution as well. (For privacy)
Greetings folks,
On a journey to get codex running and accepting local models and as you might be aware, they use responses api deeply inside codex (and refuse to patch it to allow locally running models) and I got to this issue finally and tried out the vllm backend with litellm and can confirm it's not working
An assumption I am making below which (could potentially) be wrong
1. LiteLLM is trying to forward the /v1/responses request to the backend at https://vllm-endpoint/v1/responses
2. The backend (vLLM) doesn't support this endpoint, so it returns a 500 error
3. This is causing a LiteLLM error: 'NoneType' object has no attribute 'get'
I am interested in working out a solution that can use litellm but internally shuffle around and rewrite events not unlike @OriNachum 's implementation (I presume)
Or do you think this is retrofitting responses into using /v1/chat/completions and doesn't make sense to do?
I think greater vllm ecosystem's acceptance of responses could take longer because of the statefulness of it
Seeing as how this beta build has -e STORE_MODEL_IN_DB=True my guess is that litellm is not afraid to tackle this statefulness in the proxy, so if there's something I can help align a pr for transforms/shuffling of data before it gets passed to vllm and back I would like some points to get to the same, potentially if it overlaps with vllm too I can try to do PR's there too and try to have things working together
Any direction or pointers to begin looking into would be greatly appreciated
@theycallmeloki why not use my server as a proxy?
Responses API is not just a tunnel and event state manager to chat completions, but also provides hosted tools support: web search, files search, computer use
And future tools: code interpreter and MCP manager
Ah I am currently using litellm proxy already and is tricky to finetune without chat completions, (/v1/responses finetuning I wouldn't even know where to begin looking for it) the earlier half of my day went into trying to figure out a way to use this open-responses but then half way through that they wanted me to deploy an entire stack of microservices just to get a proxy that does openai response API (this is mostly due to the statefulness involved like the one I mentioned above) and I gave up after failing to get postgres up for this
I did look through your solution too but dropped off as I reached towards babytau because it seemed like it was meant more for ARM, I am completely on x86, and understand I can use this link as a potential starter kit but I am not too sure that would work or how I need to involve the parent repo here or it's deployable counterpart here
If you could try to maybe make a dockerfile that has just the step neccassary to accomplish the following things, I think not just me, but a potential many more people who come accross this will gladly use your work, steps:
- I should be able to
pip install openai-responses-serverin a step in a dockerfile - I should be able to specify the OpenAI base url and models supported (in your example http://localhost:11434 with llama-3b should be fine, but I do need some programmatic way to configure this, potentially in the following step as a command line argument? or like codex I don't mind env variables either)
- I should be able to spin up a
python -m openai-responses-server 8080which acts as a 1-1 just in time proxy for bridging disparate calls into regular/v1/chat/completiontool calls which work with the local ollama andand permeate responses back upstream