gui-agents topic
Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
Agent-S
Agent S: an open agentic framework that uses computers like a human
UI-TARS-desktop
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
Screen-Point-and-Read
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
gelab-zero
GELab: GUI Exploration Lab. One of the best GUI agent solutions in the galaxy, built by the StepFun-GELab team and powered by Step’s research capabilities.
ScaleCUA
ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).
UGround
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
UI-Ins
Official implementation of UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
GUI-RCPO
[AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615
monday
[CVPR 2025] Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents