MobileAgent How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?

Open fredajiang opened this issue 1 year ago • 1 comments

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?

I replaced the original grounding-dino model with a GPU-supported version, reducing the time required from about 7 seconds to just 0.2 seconds. For more details on the GPU version of grounding-dino, please refer to the link: https://github.com/IDEA-Research/GroundingDINO
For the OCR model, is there a similarly faster GPU-supported version? Currently, each OCR operation takes approximately 3 seconds.
For calling chatgpt-4o, do you have any suggestions for improving its execution speed? At present, each call to chatgpt-4o takes approximately 6-7 seconds. Looking forward to your response.

Sep 04 '24 08:09 fredajiang

MobileAgent MobileAgent copied to clipboard

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?

MobileAgent
MobileAgent copied to clipboard