bytebot icon indicating copy to clipboard operation
bytebot copied to clipboard

Compatibility Issues with Qwen Series Models (VL, QVQ-max) via LiteLLM Proxy

Open Uc207Pr4f57t9-251 opened this issue 2 months ago • 10 comments

I've been attempting to integrate Alibaba Cloud's Qwen series models (specifically dashscope/qwen-vl-max and dashscope/qvq-max) into Bytebot using the recommended LiteLLM proxy setup (local Docker Compose). While basic connectivity was established after significant debugging (related to agent authentication and build caching), severe compatibility issues remain with these specific models, preventing their effective use.

1. Qwen-VL Models (e.g., qwen-vl-max) - Non-Standard Tool Calling:

  • Problem: When bytebot-agent sends a request with tools defined and tool_choice: "auto", qwen-vl-max (via LiteLLM) does not populate the standard tool_calls field in the response. Instead, it returns tool_calls: null and embeds the intended tool calls as JSON code blocks (e.g., ```json { "name": "...", "input": {...} } ```) directly within the message.content field, often mixed with natural language "thinking" text.
  • Impact: bytebot-agent's current response parser (formatChatCompletionResponse in proxy.service.ts) only checks the message.tool_calls field. Since it's null, the agent fails to recognize or execute the tool calls, treating the entire content (including the JSON blocks) as plain text output to the user.
  • Debugging Done:
    • Confirmed via direct curl/Python requests to litellm-proxy that the issue persists even when bytebot-agent is bypassed, proving it's an incompatibility between Qwen-VL's output format and the standard expected by the agent.
    • Ensured reasoning_effort parameter was removed from bytebot-agent source code (proxy.service.ts) via --no-cache builds, ruling it out as the cause.
    • Configured tool_choice: "auto" via LiteLLM UI ("Default Parameters") for the model, which did not resolve the issue.

2. QVQ/QVQ-max Models - HTTPS Connection Error:

  • Problem: Attempts to use dashscope/qvq-max consistently fail with the error litellm.BadRequestError: DashscopeException - current user api does not support http call.
  • Debugging Done:
    • Verified multiple times that the api_base configured in the LiteLLM UI for this model is correctly set to HTTPS: https://dashscope.aliyuncs.com/compatible-mode/v1.
    • This error occurs even when the Docker Desktop global proxy and any system-level proxies (like Clash) are completely disabled, ruling out proxy HTTPS-stripping.
    • Other Dashscope models (like qwen-vl-max) connect successfully over HTTPS using the same LiteLLM proxy setup.
  • Conclusion: This suggests a specific issue either with the Dashscope endpoint for qvq-max when accessed via the OpenAI compatibility layer, or with how the LiteLLM adapter handles HTTPS requests only for this specific model variant.

Environment:

  • Bytebot: Built locally from source (recent edge equivalent).
  • LiteLLM: Running via Docker using ghcr.io/berriai/litellm:main-stable image.
  • Models Tested: dashscope/qwen-vl-max, dashscope/qvq-max.
  • Setup: Local Docker Compose on Windows, using the project's provided postgres container for both bytebot-agent and litellm-proxy databases (bytebotdb and litellm_logs_db respectively). LiteLLM configured with master_key and encryption_key.

Workarounds Attempted:

  • Qwen-VL: Manually modified bytebot-agent's formatChatCompletionResponse function in proxy.service.ts to parse ```json ... ``` blocks from message.content when message.tool_calls is null. (Code based on our discussion can be provided if needed). This works but requires modifying agent source.
  • QVQ-max: No workaround found. Model remains unusable due to the persistent HTTPS error.

Suggestions:

  1. Enhance bytebot-agent Parser: Update formatChatCompletionResponse to include fallback logic that parses JSON code blocks from message.content if message.tool_calls is null/empty. This would provide out-of-the-box compatibility with models like Qwen-VL.
  2. Investigate QVQ HTTPS Issue: This seems like a deeper issue, potentially within LiteLLM's Dashscope adapter or the Dashscope endpoint itself. Collaboration with the LiteLLM team might be needed.
  3. Document Compatibility: Update Bytebot documentation regarding known compatibility issues with specific Qwen models and potential workarounds (like the agent code modification).
  4. Agent Authentication: Address the underlying issue where bytebot-agent ignores standard proxy keys and requires hardcoding or the OPENAI_API_KEY workaround.

Thanks for looking into this. Qwen models are important in certain regions, and improving compatibility would be very beneficial.

Uc207Pr4f57t9-251 avatar Oct 26 '25 08:10 Uc207Pr4f57t9-251

This is particularly important. To the extent of my knowledge, Qwen VL is also better suited for computer-using agent (with the latest open-source model Qwen3 VL). So, integrating it is definitely very crucial. I am using OpenRouter in my bytebot fork - to use the deployed Qwen3VL. Btw, a quick question, do you think using litellm is better?

BuesrB avatar Oct 26 '25 13:10 BuesrB

Hi @BuesrB Your approach using OpenRouter in a Bytebot fork sounds quite interesting, especially since you mentioned Qwen3 VL seems well-suited for this type of agent work. I took a look at your GitHub profile hoping to find the fork you mentioned and learn more, but I wasn't able to locate it. Would you be willing to share a link to your fork if it's public? More specifically, I'm particularly curious about how you handled the Qwen VL tool calling incompatibility we discussed. Does OpenRouter handle this adaptation automatically for Qwen models, or did you need to implement specific adapter logic (e.g., parsing the content field) or perhaps use targeted prompt engineering within your fork to get the tool calls working reliably? Thanks!

Uc207Pr4f57t9-251 avatar Oct 28 '25 05:10 Uc207Pr4f57t9-251

@Uc207Pr4f57t9-251

I made a small change to support the passing of model name when using litellm and was able to get btyebot to work with qwen3-vl. It can open a browser, but fails on other tool calls due to:

https://github.com/bytebot-ai/bytebot/issues/153

So it looks like there is some command mapping to do.

vxtra1973 avatar Oct 29 '25 11:10 vxtra1973

Hi @BuesrB Your approach using OpenRouter in a Bytebot fork sounds quite interesting, especially since you mentioned Qwen3 VL seems well-suited for this type of agent work. I took a look at your GitHub profile hoping to find the fork you mentioned and learn more, but I wasn't able to locate it. Would you be willing to share a link to your fork if it's public? More specifically, I'm particularly curious about how you handled the Qwen VL tool calling incompatibility we discussed. Does OpenRouter handle this adaptation automatically for Qwen models, or did you need to implement specific adapter logic (e.g., parsing the content field) or perhaps use targeted prompt engineering within your fork to get the tool calls working reliably? Thanks!

The fork is here https://github.com/kira-id/cua.kira . We have improved it - it works now with Qwen3VL, takeover desktop directly on home, also added Blender on Desktop, merged some PR here that actually is useful and more. Could you please take a look and see if it works for you?

The adapter logic is modified accordingly continuing the PR https://github.com/bytebot-ai/bytebot/pull/145 . I do believe that this adapter logic should be possible to be standardized among Grok, Anthropic, OpenAI, etc - just with minimal modification for each providers. This should be the intention of the proxy approach I suppose. At the fork, they are still separated having better clarity and easier development.

The main next step that I see, comes from the fact that no LLM so far that I have tested is able to do CUA that well. Particularly, I see it misclicks buttons a lot, it often misses some pixels a way. Almost there, but not quite there. So, I suppose this is the next step in making CUA actually useful.

samkoesnadi avatar Oct 29 '25 13:10 samkoesnadi

@Uc207Pr4f57t9-251

I made a small change to support the passing of model name when using litellm and was able to get btyebot to work with qwen3-vl. It can open a browser, but fails on other tool calls due to:

#153

So it looks like there is some command mapping to do.

The command mapping is definitely the big thing to do. We have done it here https://github.com/kira-id/cua.kira . I think I have tagged you in another issue haha

samkoesnadi avatar Oct 29 '25 13:10 samkoesnadi

Hi @BuesrB Your approach using OpenRouter in a Bytebot fork sounds quite interesting, especially since you mentioned Qwen3 VL seems well-suited for this type of agent work. I took a look at your GitHub profile hoping to find the fork you mentioned and learn more, but I wasn't able to locate it. Would you be willing to share a link to your fork if it's public? More specifically, I'm particularly curious about how you handled the Qwen VL tool calling incompatibility we discussed. Does OpenRouter handle this adaptation automatically for Qwen models, or did you need to implement specific adapter logic (e.g., parsing the content field) or perhaps use targeted prompt engineering within your fork to get the tool calls working reliably? Thanks!

The fork is here https://github.com/kira-id/cua.kira . We have improved it - it works now with Qwen3VL, takeover desktop directly on home, also added Blender on Desktop, merged some PR here that actually is useful and more. Could you please take a look and see if it works for you?

The adapter logic is modified accordingly continuing the PR #145 . I do believe that this adapter logic should be possible to be standardized among Grok, Anthropic, OpenAI, etc - just with minimal modification for each providers. This should be the intention of the proxy approach I suppose. At the fork, they are still separated having better clarity and easier development.

The main next step that I see, comes from the fact that no LLM so far that I have tested is able to do CUA that well. Particularly, I see it misclicks buttons a lot, it often misses some pixels a way. Almost there, but not quite there. So, I suppose this is the next step in making CUA actually useful.

Was just about to leave a comment to reply but thank you so much @samkoesnadi , and yes I totally agree with you. Particularly the misclicks and the incorrectness of the coordinates from the cursor are creating lots of errors. But yes, it is almost there. So anyone who has interest and curious are welcomed to test, try out and leave a comment :) https://github.com/kira-id/cua.kira

BuesrB avatar Oct 29 '25 13:10 BuesrB

May I ask how to solve this error when I'm using it?

bytebot-agent | If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.. Received Model Group=openrouter/qwen/qwen3-vl-32b-instruct bytebot-agent | Available Model Group Fallbacks=None bytebot-agent | Error: 400 litellm.UnsupportedParamsError: openrouter does not support parameters: ['reasoning_effort'], for model=qwen/qwen3-vl-32b-instruct. To drop these, set litellm.drop_params=True or for proxy: bytebot-agent | bytebot-agent | litellm_settings: bytebot-agent | drop_params: true bytebot-agent | . bytebot-agent | If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.. Received Model Group=openrouter/qwen/qwen3-vl-32b-instruct bytebot-agent | Available Model Group Fallbacks=None bytebot-agent | at APIError.generate (/app/bytebot-agent/node_modules/openai/core/error.js:45:20) bytebot-agent | at OpenAI.makeStatusError (/app/bytebot-agent/node_modules/openai/client.js:158:32) bytebot-agent | at OpenAI.makeRequest (/app/bytebot-agent/node_modules/openai/client.js:301:30) bytebot-agent | at process.processTicksAndRejections (node:internal/process/task_queues:95:5) bytebot-agent | at async ProxyService.generateMessage (/app/bytebot-agent/dist/proxy/proxy.service.js:44:32) bytebot-agent | at async AgentProcessor.runIteration (/app/bytebot-agent/dist/agent/agent.processor.js:137:29)

liliangdao avatar Oct 30 '25 07:10 liliangdao

Hi @liliangdao,

This is a known issue that stems from the bytebot-agent's source code, not your model configuration. The agent hardcodes a reasoning_effort: 'high' parameter in all its requests, which openrouter does not support, leading to the 400 UnsupportedParamsError you're seeing.

As you noted in the logs, the fix is to configure your litellm-proxy to silently drop this unsupported parameter before it reaches OpenRouter.

The Solution

  1. Open your litellm-proxy configuration file, located at packages/bytebot-llm-proxy/litellm-config.yaml.

  2. Add the drop_params: true line inside the litellm_settings: block.

    litellm_settings:
      debug: true 
      detailed_debug: true
      encryption_key: "your_encryption_key" # or other settings...
    
      # --- Add this line ---
      drop_params: true 
      # ---------------------
    
  3. After saving the file, you must rebuild your Docker containers for the litellm-proxy to load this new setting. Run:

    # Make sure to include all your .yml files (e.g., -f docker-compose.yml)
    docker compose up -d --build 
    

    This will resolve the reasoning_effort error.


A Note on This Topic

Just a heads-up, this specific reasoning_effort / drop_params bug has been discussed in detail in Issue #151 (as seen in this comment: https://github.com/bytebot-ai/bytebot/issues/151#issuecomment-3466966448).

To keep the conversation focused, it's best to discuss any further issues related to this specific parameter in that thread. This current issue is tracking the separate, more complex problem of Qwen models returning a non-standard tool call format (i.e., embedding JSON in the content field instead of using the tool_calls array).

Uc207Pr4f57t9-251 avatar Oct 30 '25 14:10 Uc207Pr4f57t9-251

@samkoesnadi Thanks for sharing your fork (https://github.com/kira-id/cua.kira). I cloned your repo and ran docker-compose up using the main docker-compose.yml file. However, when the UI starts, it seems unable to load any of the default models (the model list appears empty).

I tried to find the setup instructions for OpenRouter but couldn't locate a specific guide or .env.example for it. I did see your Pull Request #2 ("cursor openrouter"), so I assumed the required variable might be OPENROUTER_API_KEY.

I added my key to a .env file in the root (OPENROUTER_API_KEY=sk-or-XXXX...), other configs in docker/.env.example and ran docker compose up -d again, but I'm still seeing the same issue (no models loading).

I feel like I must be missing a configuration step. Is OPENROUTER_API_KEY the correct environment variable, or is there another file I need to set up to get the agent to connect to OpenRouter and load the Qwen3VL models?

docker ouput:

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [AgentAnalyticsService] BYTEBOT_ANALYTICS_ENDPOINT is not set. Analytics service disabled.

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [AnthropicService] ANTHROPIC_API_KEY is not set. AnthropicService will not work properly.

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [OpenAIService] OPENAI_API_KEY is not set. OpenAIService will not work properly.

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [GoogleService] GEMINI_API_KEY is not set. GoogleService will not work properly.

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [ProxyService] BYTEBOT_LLM_PROXY_URL is not set. ProxyService will not work properly.

Seems no OPENROUTER_API_KEY is loaded?

Uc207Pr4f57t9-251 avatar Oct 30 '25 15:10 Uc207Pr4f57t9-251

@samkoesnadi Thanks for sharing your fork (https://github.com/kira-id/cua.kira). I cloned your repo and ran docker-compose up using the main docker-compose.yml file. However, when the UI starts, it seems unable to load any of the default models (the model list appears empty).

I tried to find the setup instructions for OpenRouter but couldn't locate a specific guide or .env.example for it. I did see your Pull Request #2 ("cursor openrouter"), so I assumed the required variable might be OPENROUTER_API_KEY.

I added my key to a .env file in the root (OPENROUTER_API_KEY=sk-or-XXXX...), other configs in docker/.env.example and ran docker compose up -d again, but I'm still seeing the same issue (no models loading).

I feel like I must be missing a configuration step. Is OPENROUTER_API_KEY the correct environment variable, or is there another file I need to set up to get the agent to connect to OpenRouter and load the Qwen3VL models?

docker ouput:

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [AgentAnalyticsService] BYTEBOT_ANALYTICS_ENDPOINT is not set. Analytics service disabled.

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [AnthropicService] ANTHROPIC_API_KEY is not set. AnthropicService will not work properly.

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [OpenAIService] OPENAI_API_KEY is not set. OpenAIService will not work properly.

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [GoogleService] GEMINI_API_KEY is not set. GoogleService will not work properly.

[Nest] 18  - 10/30/2025, 3:12:19 PM    WARN [ProxyService] BYTEBOT_LLM_PROXY_URL is not set. ProxyService will not work properly.

Seems no OPENROUTER_API_KEY is loaded?

Hi, thanks for trying it out. You mentioned you put the .env in root? Maybe that's the culprit, it should be under docker/ . So, it should be docker/.env . I hope this will fix it..

samkoesnadi avatar Oct 30 '25 16:10 samkoesnadi