OmniParser
OmniParser copied to clipboard
Error When Testing Icon Classification with BLIP-2 Model
Description:
When running the command below to test icon classification with the BLIP-2 model, I encountered a tensor mismatch error. Despite following the instructions referenced in the error message, the issue persists.
python gradio_demo.py --icon_detect_model weights/icon_detect_v1_5/model_v1_5.pt --icon_caption_model blip2
Steps to Reproduce:
- Run the following command:
python gradio_demo.py --icon_detect_model weights/icon_detect_v1_5/model_v1_5.pt --icon_caption_model blip2
- Observe the output log:
Using MPS device. Loading checkpoint shards: 100%|...| 2/2 [00:36<00:00, 18.20s/it] * Running on local URL: http://127.0.0.1:7860 To create a public link, set share=True in launch(). start processing datetime: 2025-01-15 16:02:40.503170 image 1/1 ...binary_image_demo.png: 576x1280 21 icons, 429.1ms Speed: 3.4ms preprocess, 429.1ms inference, 18.8ms postprocess per image at shape (1, 3, 576, 1280) Expanding inputs for image tokens in BLIP-2 should be done in processing. Please follow instructions here: (https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042)... Sizes of tensors must match except in dimension 1. Expected size 19 but got size 1 for tensor number 1 in the list.
Environment:
- OS: macOS
- Device: MPS (Metal Performance Shaders)
- Python Version: 3.12
- Gradio Version: 5.12.0
- BLIP-2 Model: Salesforce/blip2-opt-2.7b
-
Icon Detection Model Path:
weights/icon_detect_v1_5/model_v1_5.pt
Additional Notes:
- The error mentions following a specific [gist](https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042), but even after following its instructions, the issue persists.
Let me know if additional information or clarifications are needed! thanks