haystack-core-integrations
haystack-core-integrations copied to clipboard
Add vLLM integration: ChatGenerator and Reranker
Summary and motivation
ChatGenerator Motivation:
from https://github.com/deepset-ai/haystack-core-integrations/issues/1958
When using OpenAIChatGenerator with OpenAI-compatible APIs that return additional custom parameters in the response delta (such as reasoning_content, thinking_content, or other provider-specific fields), these parameters are currently ignored and lost during the streaming chunk conversion process.
For example, when using APIs that provide reasoning capabilities or additional metadata, the current implementation only extracts standard OpenAI fields (content, tool_calls, etc.) and discards any custom fields that might be present in the choice.delta object.
Reranker Motivation:
vLLM supports hosting of models like Qwen3-Reranker which are a new style of powerful rerankers based on the Qwen3ForCausalLM architecture in HuggingFace. Supporting causal LLMs for reranking has not yet been standardized so often times specific preprocessing and post-processing is needed to get these models to work. For example check out the example code here for the Qwen3-Reranker.
We have clients who would like to use this model for reranking via vLLM hosting so it would be nice to add a component in this integration to support that.
Checklist
If the request is accepted, ensure the following checklist is complete before closing this issue.
Tasks
- [ ] The code is documented with docstrings and was merged in the
mainbranch - [ ] Docs are published at https://docs.haystack.deepset.ai/
- [ ] There is a Github workflow running the tests for the integration nightly and at every PR
- [ ] A new label named like
integration:<your integration name>has been added to the list of labels for this repository - [ ] The labeler.yml file has been updated
- [ ] The package has been released on PyPI
- [ ] An integration tile has been added to https://github.com/deepset-ai/haystack-integrations
- [ ] The integration has been listed in the Inventory section of this repo README
- [ ] There is an example available to demonstrate the feature
- [ ] The feature was announced through social media