MobileAgent
MobileAgent copied to clipboard
How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?
How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?
- I replaced the original grounding-dino model with a GPU-supported version, reducing the time required from about 7 seconds to just 0.2 seconds. For more details on the GPU version of grounding-dino, please refer to the link: https://github.com/IDEA-Research/GroundingDINO
- For the OCR model, is there a similarly faster GPU-supported version? Currently, each OCR operation takes approximately 3 seconds.
- For calling chatgpt-4o, do you have any suggestions for improving its execution speed? At present, each call to chatgpt-4o takes approximately 6-7 seconds. Looking forward to your response.