Jiaxuan Liu

Results 20 comments of Jiaxuan Liu

Switch to the window that shows the smartphone screenshot with labels, then press a random key on your keyboard. You can also modify cv2.waitKey(0) to cv2.waitKey([time in milliseconds]) to control...

Hi, That was a great question and we just tried to let AppAgent authenticate itself as a human on X (Twitter). It succeeded! Check the following video. https://github.com/mnotgod96/AppAgent/assets/27103154/4a71a942-8ed4-4711-90b3-8a716a4ad4df

This xml dump error is usually seen in a nonstatic interface, e.g. the video is playing or the GIF picture is playing. Can you try again on a static app...

If you take a close look at the and_controller.py in the scripts directory, the xml file is dumped using the uiautomator. The bounds property can be found in the xml...

看起来是单纯的网络环境的问题,建议先根据openai的文档看看请求链路能否走通 https://platform.openai.com/docs/guides/vision

如果你是指在手机上的浏览器上进行操作的话,可以试试Chrome,我曾经试过在Chrome里打开YouTube做一些操作,是可行的

暂时还没有在Gemini上测过agent的能力。不过以我的经验来看Gmail发邮件的难点主要在于让模型填写收件人后点击下拉栏中的邮箱地址确认收件人这一步,如果模型生成的文档里没有关于这一步的说明的话,有很大概率会失败的。可以尝试手动优化一下ui文档或者一些别的邮箱app看看。

看起来像是gpt4v请求返回的错误信息,建议先unit test一下model.py里的ask_gpt4v,看看能不能跑通

跳出手机截屏窗口后点击键盘上任意按键就可以继续操作了哈

> I run this version, but I got this error in last section. > > ``` > oup = model.run_single(source_path, target_path, crop_align=True,cat=True) > ``` > > please help me >...