OmniParser
OmniParser copied to clipboard
A simple screen parsing tool towards pure vision based GUI agent
when I try to run the script omniparser.py, it shows the following error for downloading the OCR module `download https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar to /users/.paddleocr/whl/det/en/en_PP-OCRv3_det_infer/en_PP-OCRv3_det_infer.tar [2025/01/30 07:38:01] ppocr INFO: Downloading en_PP-OCRv3_det_infer.tar from https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar...
when I run the python weights/convert_safetensor_to_pt.py, it show the error above. And I have checked that the huggingface repo doesn't provide the train_args.yaml in the folder icon/detect, which is version...
I tried version 1 and 1.5. There are no coordinates output. Am I missing something?
This PR closes #35 by addressing the issue of `__pycache__` directories being present in the repository, despite being included in `.gitignore`. The `__pycache__` directories contain Python bytecode files (`.pyc`) that...
line 110 of utils.py: ```generated_ids = model.generate(input_ids=inputs["input_ids"],pixel_values=inputs["pixel_values"],max_new_tokens=100,num_beams=3, do_sample=False)``` This line runs forever on macbook pro m2 pro. paddleocr==2.9.1 paddlepaddle==3.0.0b1 numpy==1.26.4 I traced it into the library code. Florence2ForConditionalGeneration class _encode_image...
I cannot get the result picture when I run in the ubuntu or Windows os, It ocuurs as "TypeError: 'NoneType' object cannot be interpreted as an integer" If someone knows...
Hi everyone, Having a few questions on the image requirements: 1. What's the minimum [device pixel ratio](https://developer.mozilla.org/en-US/docs/Web/CSS/@media/-webkit-device-pixel-ratio) for image that can be passed to get proper results? Currently we send...
my aim to get the coordiantes of some icon in the pic/screenshot.. how ?
I saw a demonstration video on the homepage, using a tool called orca agent to generate operations. Is this tool open source?