OpenThinkIMG icon indicating copy to clipboard operation
OpenThinkIMG copied to clipboard

Tools not invoked when using Qwen2VL weights with OpenThinkIMG

Open Guo-Yilong opened this issue 5 months ago • 3 comments

Hello, when I try to use Qwen2VL's weights to test OpenThinkIMG on the OpenThinkIMG-Chart-Test-994 dataset, I find that the tools are not called and the results are output directly. How can I solve this problem?

Guo-Yilong avatar Jul 03 '25 02:07 Guo-Yilong

I encountered the same problem, and I tried to use the ckpt on the provided hf. It seems that the effect is the same? Have you solved this problem?

yxizhong avatar Jul 07 '25 06:07 yxizhong

I've added system_prompt for RL training to the inference phase code, and it looks like it's ready to call the tool

"""You are a visual assistant capable of generating and solving steps for chart-based reasoning. Your goal is to answer chart-related questions. You can rely on your own capabilities or use external tools to assist in solving. Here are the available actions:
        - **OCR**: Extracts text from an image. Example: `{"name": "OCR", "arguments": {"image": "img_1"}}`
        - **Point**: Identifies a point in the image based on description and returns coordinates. Example: `{"name": "Point", "arguments": {"image": "img_1", "param": "x-axis value 1970"}}`
        - **ZoomInSubfigure**: Crops the image to the specified subfigure. Example: `{"name": "ZoomInSubfigure", "arguments": {"image": "img_1", "param": "Downstream vs. Concept: Toy"}}`
        - **SegmentRegionAroundPoint**: Segments a region around a given point. Example: `{"name": "SegmentRegionAroundPoint", "arguments": {"image": "img_1", "param": "x=\"21.5\" y=\"28.5\""}}`
        - **DrawHorizontalLineByY**: Draws a horizontal line at a given y-coordinate. Example: `{"name": "DrawHorizontalLineByY", "arguments": {"image": "img_1", "param": "y=28.5"}}`
        - **DrawVerticalLineByX**: Draws a vertical line at a given x-coordinate. Example: `{"name": "DrawVerticalLineByX", "arguments": {"image": "img_1", "param": "x=21.5"}}`
        - **Terminate**: Ends the task and provides the final answer. Example: `{"name": "Terminate", "arguments": {"ans": "1985"}}`
        To solve the problem:
        1. Select actions from the provided tools list, combining them logically and building on previous steps. Call one action at a time, using its output for the next.
        2. To use `SegmentRegionAroundPoint`, `DrawHorizontalLineByY`, or `DrawVerticalLineByX`, first call "Point" to get coordinates for further actions.
        Your output should be in a strict JSON format as follows:
        {"thought": "the reasoning process", "actions": [{"name": "action", "arguments": {"argument1": "value1", "argument2": "value2"}}]}
        """

yxizhong avatar Jul 15 '25 03:07 yxizhong

Hi, I tried adding the same system_prompt to the inference phase, but I encountered an error. Could you please share how you added the system prompt in your code? Thanks a lot!

Here’s the error log I got: 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: inputs = self.form_input_from_dynamic_batch(batch) 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: File "/root/work/filestorage/gaoshan/projects/OpenThinkIMG/tool_server/tf_eval/models/qwen2vl.py", line 155, in form_input_from_dynamic_batch 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: image_inputs, _ = process_vision_info(messages) 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: File "/root/work/filestorage/gaoshan/conda_envs/qwen2_5vl/lib/python3.10/site-packages/qwen_vl_utils/vision_process.py", line 364, in process_vision_info 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: image_inputs.append(fetch_image(vision_info)) 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: File "/root/work/filestorage/gaoshan/conda_envs/qwen2_5vl/lib/python3.10/site-packages/qwen_vl_utils/vision_process.py", line 116, in fetch_image 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: image_obj = Image.open(image) 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: File "/root/work/filestorage/gaoshan/conda_envs/qwen2_5vl/lib/python3.10/site-packages/PIL/Image.py", line 3465, in open 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: fp = builtins.open(filename, "rb") 2025-11-05 16:49:11 | ERROR | stderr | [rank2]: OSError: [Errno 36] File name too long: '/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAFxAXADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1


I've added system_prompt for RL training to the inference phase code, and it looks like it's ready to call the tool

"""You are a visual assistant capable of generating and solving steps for chart-based reasoning. Your goal is to answer chart-related questions. You can rely on your own capabilities or use external tools to assist in solving. Here are the available actions: - OCR: Extracts text from an image. Example: {"name": "OCR", "arguments": {"image": "img_1"}} - Point: Identifies a point in the image based on description and returns coordinates. Example: {"name": "Point", "arguments": {"image": "img_1", "param": "x-axis value 1970"}} - ZoomInSubfigure: Crops the image to the specified subfigure. Example: {"name": "ZoomInSubfigure", "arguments": {"image": "img_1", "param": "Downstream vs. Concept: Toy"}} - SegmentRegionAroundPoint: Segments a region around a given point. Example: {"name": "SegmentRegionAroundPoint", "arguments": {"image": "img_1", "param": "x=\"21.5\" y=\"28.5\""}} - DrawHorizontalLineByY: Draws a horizontal line at a given y-coordinate. Example: {"name": "DrawHorizontalLineByY", "arguments": {"image": "img_1", "param": "y=28.5"}} - DrawVerticalLineByX: Draws a vertical line at a given x-coordinate. Example: {"name": "DrawVerticalLineByX", "arguments": {"image": "img_1", "param": "x=21.5"}} - Terminate: Ends the task and provides the final answer. Example: {"name": "Terminate", "arguments": {"ans": "1985"}} To solve the problem: 1. Select actions from the provided tools list, combining them logically and building on previous steps. Call one action at a time, using its output for the next. 2. To use SegmentRegionAroundPoint, DrawHorizontalLineByY, or DrawVerticalLineByX, first call "Point" to get coordinates for further actions. Your output should be in a strict JSON format as follows: {"thought": "the reasoning process", "actions": [{"name": "action", "arguments": {"argument1": "value1", "argument2": "value2"}}]} """

GaoXiaoshan avatar Nov 05 '25 09:11 GaoXiaoshan