MobileAgent icon indicating copy to clipboard operation
MobileAgent copied to clipboard

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?

Open fredajiang opened this issue 1 year ago • 1 comments

How to improve the execution speed of OCR, grounding-dino, and chatgpt-4o models to transition mobile-agent from laboratory research to engineering use?

  1. I replaced the original grounding-dino model with a GPU-supported version, reducing the time required from about 7 seconds to just 0.2 seconds. For more details on the GPU version of grounding-dino, please refer to the link: https://github.com/IDEA-Research/GroundingDINO
  2. For the OCR model, is there a similarly faster GPU-supported version? Currently, each OCR operation takes approximately 3 seconds.
  3. For calling chatgpt-4o, do you have any suggestions for improving its execution speed? At present, each call to chatgpt-4o takes approximately 6-7 seconds. Looking forward to your response.

fredajiang avatar Sep 04 '24 08:09 fredajiang