UI-TARS
UI-TARS copied to clipboard
More details about OSWorld running script.
Hi, the argument values are here: https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/uitars_agent.py#L404
Hi, the argument values are here: https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/uitars_agent.py#L404
https://github.com/xlang-ai/OSWorld/blob/884676cebc4300983cf1285177b30dcc8ef25746/mm_agents/uitars_agent.py#L413
The observation is screenshot + a11y tree? It is confusing since UI-TARS is trained on screenshot-only observation.
Any progress on this?