AppAgent
AppAgent copied to clipboard
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Hi, thanks for sharing. I am wondering whether you could also share the reference document since it's expensive to use the GPT4V API to construct such a document. In such...
We should merge the JSON handling code (the `content` assembly code) from transaction layer to the base model layer, and `ask_gpt4v` method signature would need to be changed to `text,...
Dear Contributors, I hope this message finds you well. I have been thoroughly engaged with the AppAgent framework and am particularly intrigued by the autonomous exploration capabilities that have been...
Hello, good work! I'm one of the authors of AutoDroid, an LLM-based Android task automation approach released several months ago before AppAgent. We did not advertise our work, so it...
PS D:\PythonWorkSpace\AppAgent> python learn.py Warning! No module named 'sounddevice' Warning! No module named 'keras' Welcome to the exploration phase of AppAgent! The exploration phase aims at generating documentations for UI...
When my code reaches the line cv2.waitKey(0), how am I supposed to input? I have pressed various keys on my computer keyboard and also performed corresponding actions on my phone,...
Well, just curious, have you guys come across any captcha on the Android phone. Can the Agent manage to solve it?
Please enter the description of the task you want me to complete in a few sentences: Task: Search for the user Bill Gates and follow him Round 1 Thinking about...
Hi team, I am impressed by the features offered by your work. Congratulations. After trying it on macOS 14.0 from a MacBook M2, I encountered two issues: 1. I was...
When I running human demonstration phase, I could only see half of the labeled screenshots, as shown in the picture below, and this window could not be changed in size...