haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

Add vLLM Chat Generator integration to support vLLM specific features like `reasoning_content`

Open AyRickk opened this issue 5 months ago • 4 comments

Feature Request: Support for Custom Response Parameters in OpenAIChatGenerator

Problem Description

When using OpenAIChatGenerator with OpenAI-compatible APIs that return additional custom parameters in the response delta (such as reasoning_content, thinking_content, or other provider-specific fields), these parameters are currently ignored and lost during the streaming chunk conversion process.

For example, when using APIs that provide reasoning capabilities or additional metadata, the current implementation only extracts standard OpenAI fields (content, tool_calls, etc.) and discards any custom fields that might be present in the choice.delta object.

Current Behavior

In the current _convert_chat_completion_chunk_to_streaming_chunk method, only predefined fields are extracted:

content = choice.delta.content or ""
chunk_message = StreamingChunk(content)
chunk_message.meta.update({
    "model": chunk.model,
    "index": choice.index,
    "tool_calls": choice.delta.tool_calls,
    "finish_reason": choice.finish_reason,
    "received_at": datetime.now().isoformat(),
})

Any additional fields in choice.delta are ignored.

Proposed Solution

Enhance the _convert_chat_completion_chunk_to_streaming_chunk method to:

  1. Extract all fields from the delta object and include them in the StreamingChunk metadata
  2. Prefix custom fields with a namespace (e.g., delta_) to avoid conflicts with standard metadata
  3. Maintain backward compatibility by keeping all existing behavior intact

Suggested Implementation

def _convert_chat_completion_chunk_to_streaming_chunk(self, chunk: ChatCompletionChunk) -> StreamingChunk:
    # ... existing logic for empty chunks ...
    
    choice: ChunkChoice = chunk.choices[0]
    content = choice.delta.content or ""
    chunk_message = StreamingChunk(content)
    
    # Extract all delta fields dynamically
    delta_fields = {}
    if hasattr(choice.delta, 'model_dump'):
        try:
            delta_dict = choice.delta.model_dump()
            for key, value in delta_dict.items():
                if key not in ['content', 'tool_calls'] and value is not None:
                    delta_fields[f"delta_{key}"] = value
        except Exception:
            # Fallback to manual extraction
            for attr in dir(choice.delta):
                if not attr.startswith('_') and attr not in ['content', 'tool_calls']:
                    try:
                        value = getattr(choice.delta, attr)
                        if value is not None and not callable(value):
                            delta_fields[f"delta_{attr}"] = value
                    except Exception:
                        continue
    
    chunk_message.meta.update({
        "model": chunk.model,
        "index": choice.index,
        "tool_calls": choice.delta.tool_calls,
        "finish_reason": choice.finish_reason,
        "received_at": datetime.now().isoformat(),
        **delta_fields,  # Include all custom delta fields
    })
    return chunk_message

Use Cases

This feature would enable support for:

  1. vLLM Reasoning parser that return reasoning_content
  2. Custom model providers with additional metadata fields
  3. Extended OpenAI-compatible APIs without requiring custom generator implementations
  4. Future OpenAI API extensions without breaking changes

Benefits

  • Improved compatibility with OpenAI-compatible APIs
  • Future-proofing against new OpenAI API features
  • Reduced need for custom implementations for minor API differences
  • Enhanced debugging capabilities with access to all response metadata
  • Zero breaking changes to existing functionality

Example Usage

After implementation, users could access custom parameters like this:

def streaming_callback(chunk: StreamingChunk):
    content = chunk.content
    reasoning = chunk.meta.get('delta_reasoning_content', '')
    custom_field = chunk.meta.get('delta_custom_parameter', '')
    # Process both standard and custom content

This would work seamlessly with any OpenAI-compatible API that includes additional response parameters, making Haystack more flexible and compatible with the growing ecosystem of LLM providers.

AyRickk avatar Jun 13 '25 13:06 AyRickk