OmniParser
OmniParser copied to clipboard
A simple screen parsing tool towards pure vision based GUI agent
using Nviadia RTX4090 64G memory server/ linux it was very slow after uploading a screen short, it took more than 1 hour no result. anything need to be checked? GPU...
entry.sh called the following scripts to initiate qemu arguments: ` cd /run . reset.sh # Initialize system . define.sh # Define versions . install.sh # Run installation . disk.sh #...
{ "som_image_base64": "", "parsed_content_list": [ { "type": "text", "bbox": [ 0.12475135177373886, 0.16597998142242432, 0.1639670431613922, 0.1924661546945572 ], "interactivity": false, "content": "@E=", "source": "box_ocr_content_ocr" }, { "type": "text", "bbox": [ 0.4279624819755554, 0.16597998142242432, 0.4882068634033203,...
does it support ios or mac book OS?or Will it support in the future?
### Add `weights_install.py` for Cross-Platform Weight Installation **Description:** This PR introduces a new Python script, `weights_install.py`, to simplify downloading model weights using the Hugging Face CLI. The script ensures compatibility...
Hello. Great tool, thank you. How can I debug the result? I'm trying to understand why `101`, `110`, `115` areas appeared. - `101` overlaps the drag and drop ID. -...
Currently, if no text elements are texted with the OCR, OmniParser will throw an exception because the ocr_bbox is None (and not interable). To fix, initialize to empty list and...
Hey there. So I realized OmniTool had problems with inserting multi-line text, I tried solving it by splitting the multi-line string and typing lines separately, but it caused problems with...
Python Version: `Python 3.12.9` PIP: ``` Package Version ------------------------- ----------- accelerate 1.4.0 aiofiles 23.2.1 aiohappyeyeballs 2.4.6 aiohttp 3.11.12 aiosignal 1.3.2 albucore 0.0.13 albumentations 1.4.10 altair 5.5.0 annotated-types 0.7.0 anthropic 0.46.0...