CogVLM icon indicating copy to clipboard operation
CogVLM copied to clipboard

Official CogAgent Demo Code has a Bug of Bounding Box Generation

Open BiEchi opened this issue 1 year ago • 0 comments

@zRzRzRzRzRzRzR

System Info / 系統信息

python 3.10.0, Transformer 4.36.2, Linux

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • [X] The official example scripts / 官方的示例脚本
  • [ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

  1. Download the repo.
  2. Run python cli_demo_sat.py --from_pretrained cogagent-chat --version chat --bf16 --stream_chat.
  3. Image is a screenshot on the phone.
  4. Prompt is What steps do I need to take to 'click the Chrome icon'?(with grounding)
  5. Then there will be a bug at this line: https://github.com/THUDM/CogVLM/blob/f7283b2c8d26cd7f932d9a5f7f5f9307f568195d/utils/utils/grounding_parser.py#L86 showing
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events
    response = await route_utils.call_process_api(
  File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/blocks.py", line 2015, in process_api
    result = await self.call_function(
  File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/blocks.py", line 1562, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/utils.py", line 865, in wrapper
    response = f(*args, **kwargs)
  File "/home/ubuntu/CogVLM/basic_demo/web_demo_simple.py", line 175, in easy_submit
    return post(input_text, temperature, top_p, top_k, image_prompt, "", "", state)[1][0][1]
  File "/home/ubuntu/CogVLM/basic_demo/web_demo_simple.py", line 126, in post
    response, _, cache_image = chat(
  File "/home/ubuntu/CogVLM/utils/utils/chat.py", line 147, in chat
    parse_response(pil_img, response)
  File "/home/ubuntu/CogVLM/utils/utils/grounding_parser.py", line 86, in parse_response
    draw_boxes(new_img, boxes, texts, output_fn=output_fn)
  File "/home/ubuntu/CogVLM/utils/utils/grounding_parser.py", line 15, in draw_boxes
    absolute_boxes = [[(int(box[0] * width), int(box[1] * height), int(box[2] * width), int(box[3] * height)) for box in b] for b in boxes]
  File "/home/ubuntu/CogVLM/utils/utils/grounding_parser.py", line 15, in <listcomp>
    absolute_boxes = [[(int(box[0] * width), int(box[1] * height), int(box[2] * width), int(box[3] * height)) for box in b] for b in boxes]
  File "/home/ubuntu/CogVLM/utils/utils/grounding_parser.py", line 15, in <listcomp>
    absolute_boxes = [[(int(box[0] * width), int(box[1] * height), int(box[2] * width), int(box[3] * height)) for box in b] for b in boxes]
IndexError: list index out of range

Looking into the bboxes, it outputs two coordinates, but here the code asks for 4 coordinates.

Expected behavior / 期待表现

The code should run without the bug above.

BiEchi avatar Nov 26 '24 19:11 BiEchi