CogVLM
CogVLM copied to clipboard
Official CogAgent Demo Code has a Bug of Bounding Box Generation
@zRzRzRzRzRzRzR
System Info / 系統信息
python 3.10.0, Transformer 4.36.2, Linux
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- [X] The official example scripts / 官方的示例脚本
- [ ] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
- Download the repo.
- Run
python cli_demo_sat.py --from_pretrained cogagent-chat --version chat --bf16 --stream_chat. - Image is a screenshot on the phone.
- Prompt is
What steps do I need to take to 'click the Chrome icon'?(with grounding) - Then there will be a bug at this line: https://github.com/THUDM/CogVLM/blob/f7283b2c8d26cd7f932d9a5f7f5f9307f568195d/utils/utils/grounding_parser.py#L86 showing
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/blocks.py", line 2015, in process_api
result = await self.call_function(
File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/blocks.py", line 1562, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, *args)
File "/home/ubuntu/miniconda3/envs/cogagent/lib/python3.10/site-packages/gradio/utils.py", line 865, in wrapper
response = f(*args, **kwargs)
File "/home/ubuntu/CogVLM/basic_demo/web_demo_simple.py", line 175, in easy_submit
return post(input_text, temperature, top_p, top_k, image_prompt, "", "", state)[1][0][1]
File "/home/ubuntu/CogVLM/basic_demo/web_demo_simple.py", line 126, in post
response, _, cache_image = chat(
File "/home/ubuntu/CogVLM/utils/utils/chat.py", line 147, in chat
parse_response(pil_img, response)
File "/home/ubuntu/CogVLM/utils/utils/grounding_parser.py", line 86, in parse_response
draw_boxes(new_img, boxes, texts, output_fn=output_fn)
File "/home/ubuntu/CogVLM/utils/utils/grounding_parser.py", line 15, in draw_boxes
absolute_boxes = [[(int(box[0] * width), int(box[1] * height), int(box[2] * width), int(box[3] * height)) for box in b] for b in boxes]
File "/home/ubuntu/CogVLM/utils/utils/grounding_parser.py", line 15, in <listcomp>
absolute_boxes = [[(int(box[0] * width), int(box[1] * height), int(box[2] * width), int(box[3] * height)) for box in b] for b in boxes]
File "/home/ubuntu/CogVLM/utils/utils/grounding_parser.py", line 15, in <listcomp>
absolute_boxes = [[(int(box[0] * width), int(box[1] * height), int(box[2] * width), int(box[3] * height)) for box in b] for b in boxes]
IndexError: list index out of range
Looking into the bboxes, it outputs two coordinates, but here the code asks for 4 coordinates.
Expected behavior / 期待表现
The code should run without the bug above.