Fix AttributeError in streaming response cleanup
This PR fixes issue #3185
The code calls await event_gen.aclose() but OpenAI's AsyncStream doesn't have an aclose() method - it has close() (which is async).
when clients cancel streaming requests, the server tries to clean up with:
await event_gen.aclose() # ❌ AsyncStream doesn't have aclose()!
But AsyncStream has never had a public aclose() method. The error message literally tells us:
AttributeError: 'AsyncStream' object has no attribute 'aclose'. Did you mean: 'close'?
^^^^^^^^
Verification
- Reproduction script
reproduce_issue_3185.shcan be used to verify the fix. - Manual checks, validation against original OpenAI library code
@r-bit-rry a chain of
hasattrlike this suggests we've done something wrong in the design. have we or can we just callclose?
It really comes down to what we want to support, since this was never strictly typed, I'm assuming there are other objects that can be generated by the sse_generator. I'd go even a step further to ask why do we even assume close method exists?
and on a more serious note @mattf PEP 525 defines aclose() for async generators OpenAI and Anthropic SDKs deviate from the standard and use close() instead (while underneath the hood they both call httpx.Response.aclose() ) We cannot control third-party SDK design choices
so its our decision if we want to enforce certain typings, and act upon, or let this pattern "catch all"
@mattf sure thing, I'll start working on those
@mattf We're facing two options in order to avoid hasattr chain as I see it when treating the AsyncStream object:
option 1: wrapping
async def wrap_async_stream(stream):
async for item in stream: yield item
this is explicit and simple but carries a small overhead per chunk
option 2: adapter pattern
class AsyncStreamAdapter(AsyncIterator[T]):
def __init__(self, stream): self._stream = stream
async def aclose(self): await self._stream.close() # Delegate close→aclose
direct delegation with no re-yielding and a more explicit intent
regarding locations of violations, where we will need patching, these are the places I was able to spot:
-
_maybe_overwrite_id(): Location: src/llama_stack/providers/utils/inference/openai_mixin.py:251 when overwrite_completion_id=False and stream=True
returned AsyncStream (has close()) instead of AsyncIterator (has aclose())
- PassthroughInferenceAdapter - openai_chat_completion() Location: src/llama_stack/providers/remote/inference/passthrough/passthrough.py:123-136
Returned raw client.chat.completions.create() response
- PassthroughInferenceAdapter - openai_completion() Location: src/llama_stack/providers/remote/inference/passthrough/passthrough.py:108-121
Returned raw client.completions.create() response
- LiteLLMOpenAIMixin - openai_chat_completion() Location: src/llama_stack/providers/utils/inference/litellm_openai_mixin.py:222-278
Returned raw litellm.acompletion() result
- LiteLLMOpenAIMixin - openai_completion() Location: src/llama_stack/providers/utils/inference/litellm_openai_mixin.py:179-220 Returned raw litellm.atext_completion() result
@r-bit-rry great finds! it looks like we're violating the api contract and using # ignore to cover it up...resulting in the bug. what fix do you suggest?
@mattf I suggest using the first option, with direct declaration of the types, and changing the API contract for openai_completion, removing the #type: ignore comments. since I'm already touching all the places where the return value of the method is handled, I think fixing the contract there is the most responsible way of handling this.
@r-bit-rry sounds good!
@mattf as I'm digging, I'm finding new stuff, apparently we have a discrepancy between official and internal documentation of openai_completion and the implementation in place: Request schema from inference.py:871: class OpenAICompletionRequestWithExtraBody(BaseModel, extra="allow"): # ... stream: bool | None = None # ← Streaming IS a parameter!
however the documentation both on our end and on openai official end does not support streaming to begin with (docs/docs/contributing/new_api_provider.mdx:40-43:
-
openai_completion(): Legacy text completion API with full parameter support -
openai_chat_completion(): Chat completion API supporting streaming, tools, and function calling
OpenAI documentation: Schema: CreateCompletionResponse "Represents a completion response from the API. Note: both the streamed and non-streamed response objects share the same shape (unlike the chat endpoint)." so only our documentation is lacking. I've taken this into the changes I've comitted
can we get rid of the type ignores now?
Remaining type: ignore comments are only for external library issues: type: ignore[arg-type] - LiteLLM streaming types don't match OpenAI type: ignore[return-value] - External libs lack type stubs
what about passthrough?
can we get rid of the type ignores now?
Remaining type: ignore comments are only for external library issues: type: ignore[arg-type] - LiteLLM streaming types don't match OpenAI type: ignore[return-value] - External libs lack type stubs
what about passthrough?
regarding the passthrough type: ignore comments. those are actually for the non-streaming path, not the streaming fix.
The issue is just a type mismatch between what the OpenAI SDK returns (Completion/ChatCompletion) and what Llama Stack types expect (OpenAICompletion/OpenAIChatCompletion). Structurally they're the same, but as far as I understand mypy doesn't see them as identical.
The streaming path (wrap_async_stream(response)) doesn't have any type ignores and doesn't need one since wrap_async_stream properly returns AsyncIterator[T].
So removing these would need either explicit type conversions or aligning llamastack type definitions with the SDK types directly, I think this is out of scope for this fix. Happy to tackle that separately if you think it's worth it though
can we get rid of the type ignores now?
Remaining type: ignore comments are only for external library issues: type: ignore[arg-type] - LiteLLM streaming types don't match OpenAI type: ignore[return-value] - External libs lack type stubs
what about passthrough?
regarding the passthrough
type: ignorecomments. those are actually for the non-streaming path, not the streaming fix.The issue is just a type mismatch between what the OpenAI SDK returns (Completion/ChatCompletion) and what Llama Stack types expect (OpenAICompletion/OpenAIChatCompletion). Structurally they're the same, but as far as I understand mypy doesn't see them as identical.
The streaming path (wrap_async_stream(response)) doesn't have any type ignores and doesn't need one since wrap_async_stream properly returns AsyncIterator[T].
So removing these would need either explicit type conversions or aligning llamastack type definitions with the SDK types directly, I think this is out of scope for this fix. Happy to tackle that separately if you think it's worth it though
i see. pls file an issue about that so we can resolve it later.
@mattf I created #4369 per you request
✱ Stainless preview builds
This PR will update the llama-stack-client SDKs with the following commit message.
fix(inference): AttributeError in streaming response cleanup
⚠️ llama-stack-client-node studio · code
There was a regression in your SDK.
generate ⚠️→build ✅→lint ✅→test ✅npm install https://pkg.stainless.com/s/llama-stack-client-node/b634edc8ffeed4f76fd9bd0eeedafd0f441a9dad/dist.tar.gz
⚠️ llama-stack-client-kotlin studio · code
There was a regression in your SDK.
generate ⚠️→lint ✅→test ❗
⚠️ llama-stack-client-go studio · code
There was a regression in your SDK.
generate ❗→lint ❗→test ❗go get github.com/stainless-sdks/llama-stack-client-go@703cd5b2f741deeb209cc8d1a039265ea58676e0
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
Last updated: 2025-12-14 12:58:02 UTC