lorax Title: stream=True only returns output from basemodel, LoRA adapter is ignored.

Title: stream=True only returns output from basemodel, LoRA adapter is ignored.

Open HARISHSENTHIL opened this issue 4 months ago • 0 comments

System Info

When using Lorax with a LoRA adapter via the /v1/chat/completions endpoint, the adapter works as expected when "stream": false.

However, when I set "stream": true, the response is clearly from the base model only, and the adapter (adapter_name) appears to be ignored.

Information

[x] Docker
[ ] The CLI directly

Tasks

[x] An officially supported command
[ ] My own modifications

Reproduction

Works: { "model": "Mistral-7B-Instruct-v0.1", "adapter_name": "Medical-Insights-QA", "stream": false, "messages": [ {"role": "user", "content": "What are symptoms of cancer?"} ] } Broken (stream: true only uses base model): { "model": "Mistral-7B-Instruct-v0.1", "adapter_name": "Medical-Insights-QA", "stream": true, "messages": [ {"role": "user", "content": "What are symptoms of cancer?"} ] }

Expected behavior

When using the /v1/chat/completions endpoint with "stream": true, I expect the model to generate streamed responses using the specified LoRA adapter (adapter_name) — just like it does when "stream": false. The adapter should influence generation in both streaming and non-streaming modes, resulting in consistent behavior and outputs aligned with the fine-tuned model.

Jul 12 '25 08:07 HARISHSENTHIL

lorax lorax copied to clipboard

Title: stream=True only returns output from basemodel, LoRA adapter is ignored.

System Info

Information

Tasks

Reproduction

Expected behavior

lorax
lorax copied to clipboard