UFO
UFO copied to clipboard
how strong the model is required to run UFO
After actual experience, I found that the overall time delay was high, mainly because requesting cloud LLM was time-consuming. Is there a quantitative analysis, about how strong the model is required to run UFO on the device side?
It is best to work with models with:
- good support for system prompt (very important). If the model doesn't support system prompt, try moving the system prompt into user prompt by modifying prompts under
ufo/prompts. - vision capability (also important)
- stable JSON output
- low hallucination
According to our experiments on WAA and OSWorld-W, even old models (with relatively low LiveBench scores) like gpt-4o-20240806 works well with UFO. However, if a model doesn't satisfy requirements above, the results could be very bad. We are working on adding the structured output feature from OpenAI API to get more stable JSON data.