llama-stack Responses max_output

🚀 Describe the new functionality needed

Responses and Completions have a max_output_tokens field. It is currently missing from the create and response object in Responses API.

Requirements

Implement max_output_tokens field as an input argument for Responses API, and add it to the response object
Pass this through to the completions request translation, and ensure that it gets returned

💡 Why is this needed? What if we don't build it?

This is a feature gap

Other thoughts

Good First Issue

Sep 26 '25 14:09 iamemilio

Hey @iamemilio , I would like to work on it, can you please help me with good starting point, files to look at?.

thank you.

/assign

Sep 29 '25 13:09 abhibongale

Of course @abhibongale!

Responses API types are stored here, which is where you can find the response object . This should be updated to contain max_output_tokens based on the openAI spec. I also like to check the openai python types to get a sense for the best way to implement that field in python.

Once you do that, you can add it to the responses create request.

Finally, you will have to pass it through to the providers. Responses in llama stack today does a translation from responses API to completions. In completions, there is a similar field, max_tokens which we can use to pass this through in the inline implementation of responses API here.

Once you finish your changes, make sure to run:

openapi type generation

 uv run python -m docs.openapi_generator.generate docs/_static

unit tests

  uv run --group unit pytest -sv llama-stack/tests/unit/

pre-commit

uv run pre-commit run

If those pass, then you're looking pretty good and are ready to update the integration tests: https://github.com/llamastack/llama-stack/tree/main/tests/integration

Hope that helps!

Sep 29 '25 14:09 iamemilio

I see there was a PR for the Responses API, any plans to fix this for the Agent API? I'm still using it though I know it's deprecated, if there is no plans to fix it for the Agent API it gives me some extra motivation to switch over to Responses.

Nov 18 '25 22:11 gnunn1

Hi @gnunn1 this is not currently planned to be contributed back to Agents API. That API was deprecated due to a reasons like: it deviating strongly from industry standards and known design patterns, and the scaling and security implications of having multi-tenant agents on the same server that may also be used as a router by other tenants for things like inference or rag tools. A colleague of mine is working on a notebook to help users migrate to responses API from agents. If you would like, here is the link: https://github.com/opendatahub-io/agents/blob/main/migration/legacy-agents/responses-api-agent-migration.ipynb

Nov 18 '25 22:11 iamemilio

Thanks, appreciate the quick reply. I'll start work on migrating my client.

Nov 18 '25 23:11 gnunn1

@abhibongale I saw your PR was closed, do you still want to work on this, or can another contributor pick this up?

Nov 19 '25 17:11 iamemilio

@iamemilio yes, if another contributor wants to pick this up, I am happy with that.

Nov 19 '25 20:11 abhibongale

Responses max_output_tokens

🚀 Describe the new functionality needed

Requirements

💡 Why is this needed? What if we don't build it?

Other thoughts

openapi type generation

unit tests

pre-commit