fix: enable parsing of OpenAI-compatible agent mode messages
Description
This implements proper parsing of agent mode messages for qwen3 and other OpenAI-compatible models which support tool use. This means that agent mode for Open-AI compatible models will now work properly, with the exception of messages which include multiple tool calls in a single message. Existing code does not support this, and this requires further changes of unknown scope.
Fixes #5419.
Checklist
- [x] I've read the contributing guide
- [x] The relevant docs, if any, have been updated or created
- [x] The relevant tests, if any, have been updated or created
Screenshots
See issue #5419 for Before screenshots.
Tests
Added a new test suite for the sessionSlice reducer's streamUpdate function, including all cases related to this functionality.
Your cubic subscription is currently inactive. Please reactivate your subscription to receive AI reviews and use cubic.
Deploy request for continuedev pending review.
Visit the deploys page to approve it
| Name | Link |
|---|---|
| Latest commit | a41c1a001731812966f8a63a788f4b65ac63c5c6 |
All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.
I have read the CLA Document and I hereby sign the CLA
😱 Found 1 issue. Time to roll up your sleeves! 😱
Are the PR test failures just flake? Reading through the logs, it looks like many of these tests don't even reach the point where extest run-tests is called. They seem to fail with 5xx errors during vscode extension installations. I don't see any reason this change would cause those failures.
The tests ran on June 12 @ 8:07 PM PST, which was not at the same time as the big google outage on June 12 between 1PM PST and 4PM PST.
I think the tests should be re-run.
That was my thinking as well. I don't have permission to re-run the tests, so someone associated with the project will need to do it.
@Patrick-Erichsen any time for this one? the issue #5419 had interest from a few of us.
@shssoichiro can you provide more details about what the fixes are? Is it just thinking tags? Since this touches sensitive streaming logic a thorough description of the problem and solution would be good.
If the only changes are for thinking tags, I am looking to add stream "middleware" that will intercept thinking XML tags as they come, and which will support partial tag streaming, which both this and the current implementation don't support.
Any other changes here?
Yes, so the bulk of the issue is around thinking tags, but specifically how OpenAI-compatible models such as Qwen3 send them. Sometimes they send them in a streaming format which Continue already handles, but other times they send the entire message in one chunk, which will look something like:
<think>Okay, I am an AI model and I am thinking about something</think>
Hello user, here is the message that I am displaying to you.
And these can also have tool calls attached in the same message. Continue currently doesn't support this non-streaming message format, so this change adds support for it.
Thanks, I've updated the code to place the think block messages into the reasoning section of the previous message, as we do elsewhere. It was easier to keep the overall structure of the PR, where we parse full blocks first, due to the different handling required for splitting them.
as a note, I ran into difficulties with manual retesting due to errors related to Cannot find module '@continuedev/openai-adapters' while attempting to update dependencies. I'm not sure if that is an issue with my environment or with current main, but it seems unrelated to the code affected by this PR.
It seems the local dev issues were indeed unrelated, I opened a separate PR to fix it: https://github.com/continuedev/continue/pull/6426
@RomneyDa this seems ready to go but will wait on your approval here.
@shssoichiro thanks for the updates. Will take a look!
:tada: This PR is included in version 1.1.0 :tada:
The release is available on:
Your semantic-release bot :package::rocket: