how strong the model is required to run UFO

Open zkfzle opened this issue 7 months ago • 1 comments

After actual experience, I found that the overall time delay was high, mainly because requesting cloud LLM was time-consuming. Is there a quantitative analysis, about how strong the model is required to run UFO on the device side?

May 14 '25 08:05 zkfzle

It is best to work with models with:

good support for system prompt (very important). If the model doesn't support system prompt, try moving the system prompt into user prompt by modifying prompts under ufo/prompts.
vision capability (also important)
stable JSON output
low hallucination

According to our experiments on WAA and OSWorld-W, even old models (with relatively low LiveBench scores) like gpt-4o-20240806 works well with UFO. However, if a model doesn't satisfy requirements above, the results could be very bad. We are working on adding the structured output feature from OpenAI API to get more stable JSON data.

May 15 '25 03:05 nice-mee