lorax icon indicating copy to clipboard operation
lorax copied to clipboard

Title: stream=True only returns output from basemodel, LoRA adapter is ignored.

Open HARISHSENTHIL opened this issue 4 months ago • 0 comments

System Info

When using Lorax with a LoRA adapter via the /v1/chat/completions endpoint, the adapter works as expected when "stream": false.

However, when I set "stream": true, the response is clearly from the base model only, and the adapter (adapter_name) appears to be ignored.

Information

  • [x] Docker
  • [ ] The CLI directly

Tasks

  • [x] An officially supported command
  • [ ] My own modifications

Reproduction

Works: { "model": "Mistral-7B-Instruct-v0.1", "adapter_name": "Medical-Insights-QA", "stream": false, "messages": [ {"role": "user", "content": "What are symptoms of cancer?"} ] } Broken (stream: true only uses base model): { "model": "Mistral-7B-Instruct-v0.1", "adapter_name": "Medical-Insights-QA", "stream": true, "messages": [ {"role": "user", "content": "What are symptoms of cancer?"} ] }

Expected behavior

When using the /v1/chat/completions endpoint with "stream": true, I expect the model to generate streamed responses using the specified LoRA adapter (adapter_name) — just like it does when "stream": false. The adapter should influence generation in both streaming and non-streaming modes, resulting in consistent behavior and outputs aligned with the fine-tuned model.

HARISHSENTHIL avatar Jul 12 '25 08:07 HARISHSENTHIL