haystack-core-integrations
haystack-core-integrations copied to clipboard
Add vLLM Chat Generator integration to support vLLM specific features like `reasoning_content`
Feature Request: Support for Custom Response Parameters in OpenAIChatGenerator
Problem Description
When using OpenAIChatGenerator with OpenAI-compatible APIs that return additional custom parameters in the response delta (such as reasoning_content, thinking_content, or other provider-specific fields), these parameters are currently ignored and lost during the streaming chunk conversion process.
For example, when using APIs that provide reasoning capabilities or additional metadata, the current implementation only extracts standard OpenAI fields (content, tool_calls, etc.) and discards any custom fields that might be present in the choice.delta object.
Current Behavior
In the current _convert_chat_completion_chunk_to_streaming_chunk method, only predefined fields are extracted:
content = choice.delta.content or ""
chunk_message = StreamingChunk(content)
chunk_message.meta.update({
"model": chunk.model,
"index": choice.index,
"tool_calls": choice.delta.tool_calls,
"finish_reason": choice.finish_reason,
"received_at": datetime.now().isoformat(),
})
Any additional fields in choice.delta are ignored.
Proposed Solution
Enhance the _convert_chat_completion_chunk_to_streaming_chunk method to:
- Extract all fields from the delta object and include them in the
StreamingChunkmetadata - Prefix custom fields with a namespace (e.g.,
delta_) to avoid conflicts with standard metadata - Maintain backward compatibility by keeping all existing behavior intact
Suggested Implementation
def _convert_chat_completion_chunk_to_streaming_chunk(self, chunk: ChatCompletionChunk) -> StreamingChunk:
# ... existing logic for empty chunks ...
choice: ChunkChoice = chunk.choices[0]
content = choice.delta.content or ""
chunk_message = StreamingChunk(content)
# Extract all delta fields dynamically
delta_fields = {}
if hasattr(choice.delta, 'model_dump'):
try:
delta_dict = choice.delta.model_dump()
for key, value in delta_dict.items():
if key not in ['content', 'tool_calls'] and value is not None:
delta_fields[f"delta_{key}"] = value
except Exception:
# Fallback to manual extraction
for attr in dir(choice.delta):
if not attr.startswith('_') and attr not in ['content', 'tool_calls']:
try:
value = getattr(choice.delta, attr)
if value is not None and not callable(value):
delta_fields[f"delta_{attr}"] = value
except Exception:
continue
chunk_message.meta.update({
"model": chunk.model,
"index": choice.index,
"tool_calls": choice.delta.tool_calls,
"finish_reason": choice.finish_reason,
"received_at": datetime.now().isoformat(),
**delta_fields, # Include all custom delta fields
})
return chunk_message
Use Cases
This feature would enable support for:
- vLLM Reasoning parser that return
reasoning_content - Custom model providers with additional metadata fields
- Extended OpenAI-compatible APIs without requiring custom generator implementations
- Future OpenAI API extensions without breaking changes
Benefits
- Improved compatibility with OpenAI-compatible APIs
- Future-proofing against new OpenAI API features
- Reduced need for custom implementations for minor API differences
- Enhanced debugging capabilities with access to all response metadata
- Zero breaking changes to existing functionality
Example Usage
After implementation, users could access custom parameters like this:
def streaming_callback(chunk: StreamingChunk):
content = chunk.content
reasoning = chunk.meta.get('delta_reasoning_content', '')
custom_field = chunk.meta.get('delta_custom_parameter', '')
# Process both standard and custom content
This would work seamlessly with any OpenAI-compatible API that includes additional response parameters, making Haystack more flexible and compatible with the growing ecosystem of LLM providers.