UFO icon indicating copy to clipboard operation
UFO copied to clipboard

how strong the model is required to run UFO

Open zkfzle opened this issue 7 months ago • 1 comments

After actual experience, I found that the overall time delay was high, mainly because requesting cloud LLM was time-consuming. Is there a quantitative analysis, about how strong the model is required to run UFO on the device side?

zkfzle avatar May 14 '25 08:05 zkfzle

It is best to work with models with:

  • good support for system prompt (very important). If the model doesn't support system prompt, try moving the system prompt into user prompt by modifying prompts under ufo/prompts.
  • vision capability (also important)
  • stable JSON output
  • low hallucination

According to our experiments on WAA and OSWorld-W, even old models (with relatively low LiveBench scores) like gpt-4o-20240806 works well with UFO. However, if a model doesn't satisfy requirements above, the results could be very bad. We are working on adding the structured output feature from OpenAI API to get more stable JSON data.

nice-mee avatar May 15 '25 03:05 nice-mee