tanuki.py
tanuki.py copied to clipboard
Support for streaming responses
LLM providers, specifically OpenAI, support streamed responses. We should support this.
Requirements:
- Iterator typed outputs should be streamable by default.
- Support 1000s of streamed outputs through last-n context management.
- TDA should use 'in' syntax with iterator-as-list objects to specify the first N examples that should be streamed