Responses max_output_tokens
🚀 Describe the new functionality needed
Responses and Completions have a max_output_tokens field. It is currently missing from the create and response object in Responses API.
Requirements
- Implement
max_output_tokensfield as an input argument for Responses API, and add it to the response object - Pass this through to the completions request translation, and ensure that it gets returned
💡 Why is this needed? What if we don't build it?
This is a feature gap
Other thoughts
Good First Issue
Hey @iamemilio , I would like to work on it, can you please help me with good starting point, files to look at?.
thank you.
/assign
Of course @abhibongale!
Responses API types are stored here, which is where you can find the response object . This should be updated to contain max_output_tokens based on the openAI spec. I also like to check the openai python types to get a sense for the best way to implement that field in python.
Once you do that, you can add it to the responses create request.
Finally, you will have to pass it through to the providers. Responses in llama stack today does a translation from responses API to completions. In completions, there is a similar field, max_tokens which we can use to pass this through in the inline implementation of responses API here.
Once you finish your changes, make sure to run:
openapi type generation
uv run python -m docs.openapi_generator.generate docs/_static
unit tests
uv run --group unit pytest -sv llama-stack/tests/unit/
pre-commit
uv run pre-commit run
If those pass, then you're looking pretty good and are ready to update the integration tests: https://github.com/llamastack/llama-stack/tree/main/tests/integration
Hope that helps!
I see there was a PR for the Responses API, any plans to fix this for the Agent API? I'm still using it though I know it's deprecated, if there is no plans to fix it for the Agent API it gives me some extra motivation to switch over to Responses.
Hi @gnunn1 this is not currently planned to be contributed back to Agents API. That API was deprecated due to a reasons like: it deviating strongly from industry standards and known design patterns, and the scaling and security implications of having multi-tenant agents on the same server that may also be used as a router by other tenants for things like inference or rag tools. A colleague of mine is working on a notebook to help users migrate to responses API from agents. If you would like, here is the link: https://github.com/opendatahub-io/agents/blob/main/migration/legacy-agents/responses-api-agent-migration.ipynb
Thanks, appreciate the quick reply. I'll start work on migrating my client.
@abhibongale I saw your PR was closed, do you still want to work on this, or can another contributor pick this up?
@iamemilio yes, if another contributor wants to pick this up, I am happy with that.