lorax
lorax copied to clipboard
Title: stream=True only returns output from basemodel, LoRA adapter is ignored.
System Info
When using Lorax with a LoRA adapter via the /v1/chat/completions endpoint, the adapter works as expected when "stream": false.
However, when I set "stream": true, the response is clearly from the base model only, and the adapter (adapter_name) appears to be ignored.
Information
- [x] Docker
- [ ] The CLI directly
Tasks
- [x] An officially supported command
- [ ] My own modifications
Reproduction
Works: { "model": "Mistral-7B-Instruct-v0.1", "adapter_name": "Medical-Insights-QA", "stream": false, "messages": [ {"role": "user", "content": "What are symptoms of cancer?"} ] } Broken (stream: true only uses base model): { "model": "Mistral-7B-Instruct-v0.1", "adapter_name": "Medical-Insights-QA", "stream": true, "messages": [ {"role": "user", "content": "What are symptoms of cancer?"} ] }
Expected behavior
When using the /v1/chat/completions endpoint with "stream": true, I expect the model to generate streamed responses using the specified LoRA adapter (adapter_name) — just like it does when "stream": false. The adapter should influence generation in both streaming and non-streaming modes, resulting in consistent behavior and outputs aligned with the fine-tuned model.