Interface-Agent
Interface-Agent copied to clipboard
InterfaceAgent: a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.
Agent
🤔 What is InterfaceAgent?
Welcome to InterfaceAgent, a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.
Here are the key capabilities of InterfaceAgent:
-
Planning & Goal Refinement: The agent is capable of constructing multi-step plans across various applications to fulfill user requests. It can also adapt and refine these plans based on user feedback during the evaluation phase.
-
Action Prediction (Pure Visual / Textual / Set-of-Mark Visual Prompting): InterfaceAgent employs a visual coordinate-based approach, pure DOM textual analysis, or set-of-marking to enhance the accuracy of predicting the next likely action.
-
Mixture of Models: InterfaceAgent is compatible with both GPT-4V and Claude models, excelling in determining the subsequent steps directly from page screenshots.
-
Resilient Error Handling: Recognizing that errors are an inherent part of AI Agents, InterfaceAgent incorporates a robust retry mechanism with exponential backoff. This allows it to skillfully navigate through temporary failures, ensuring the Agent's progress is uninterrupted.
InterfaceAgent OS-specific agents extend the core toolkit with advanced automation for the target platform:
- Preview of iOS Agents: Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your iOS device.
- Preview of Windows Agents: Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Windows 11 device.
- Preview of Appium Android Agents (Coming soon): Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Android device.
- Playwright-based Web Agents (Coming soon): Learn how to build Web AI Agent Companions.
💻 Getting Started
You can choose to either clone the repository or use npm, yarn, or pnpm to install InterfaceAgent.
- For Core, see installation steps.
- For iOS, see installation steps.
- For Windows, see installation steps.
🎬 Demos
Windows
1) User Query: Help me download an app named EdgeTile
2) User Query: Dropshipping products on Tiktok
iOS
User Query: Help me prepare for a 30 days of fitness challenge
🚀 Challenges and Focus
InterfaceAgent continues to face challenges in long-horizon planning and selector inference accuracy. The current focus is on enhancing the stability of InterfaceAgent agents.
🤓 Contributing
We welcome contributions. Please follow the standard fork-and-pull request workflow for your contributions.
🛂 License
InterfaceAgent is licensed under the MIT License.
🚑 Support
For support, questions, or feature requests, open an issue in the GitHub repository.