InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

[Feature] GUI agent有调用prompt么?

Open Larry225 opened this issue 8 months ago • 9 comments

Motivation

internvl3有GUI agent的官方调用prompt么?

Related resources

No response

Additional context

No response

Larry225 avatar Apr 17 '25 06:04 Larry225

您好,感谢您的关注。当前InternVL3主要支持GUI grounding,只要将system prompt修改成如下即可。

sys_prompt = You are InternVL, a GUI agent. \n\nYou are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task.

指令触发可以参考ScreenSpot里的指令样例。如下:

minimize this window

open the music library

....

我们后续会更新支持更多的GUI 能力,敬请期待!

liu-zhy avatar Apr 18 '25 06:04 liu-zhy

好的,感谢回复,类似qwen2.5vl这种完整的Action指令,暂时还不支持对吧? prompt = '''## Task: {instruction} ## History Actions: {operation_history} ## Action Space 1. CLICK([block_index, cx, cy], "text") 2. TYPE("text") 3. PRESS_BACK() 4. PRESS_HOME() 5. PRESS_ENTER() 6. SWIPE_UP() 7. SWIPE_DOWN() 8. SWIPE_LEFT() 9. SWIPE_RIGHT() 10. COMPLETED() ## Requirements: Please infer the next action according to the Task and History Actions. Think step by step. Return with Image Description, Next Action Description and Action Code. The Action Code should follow the definition in the Action Space.'''

Larry225 avatar Apr 18 '25 07:04 Larry225

@liu-zhy 您好,请问单图对话是哪里设置system_prompt呢,我参考readme调用示例直接在query后用您提到的prompt拼接测试集指令,用intern-vl-8B在screenspot-v2上仅有不到70的准确率,请问是哪里设置有误吗? 示例query:You are InternVL, a GUI agent. \n\nYou are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task. register now for pytorch ans:pyautogui.click(x=0.2346, y=0.381)

TianlongLee avatar Apr 21 '25 11:04 TianlongLee

@liu-zhy 您好,请问单图对话是哪里设置system_prompt呢,我参考readme调用示例直接在query后用您提到的prompt拼接测试集指令,用intern-vl-8B在screenspot-v2上仅有不到70的准确率,请问是哪里设置有误吗? 示例query:You are InternVL, a GUI agent. \n\nYou are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task. register now for pytorch ans:pyautogui.click(x=0.2346, y=0.381)

register now for pytorch ans:也是prompt的一部分吗?如果是的话,请删除后再测试下

liu-zhy avatar Apr 24 '25 10:04 liu-zhy

@liu-zhy ,您好,请问一下internvl3测screenspot-v2时测试,用 prompt 是这样的吗?user query 部分只需要直接给screenspot-v2中的指令,不需要其他的东西?

system prompt: You are InternVL, a GUI agent. \n\nYou are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task.

user query: sign out

chhluo avatar Jul 16 '25 02:07 chhluo

我用了上面了的prompt基本能复现结果,原始论文v2: 81.4 复现: 81.6。 首先在这个文件InternVL3-8B/modeling_internvl_chat.py大概268行左右,更换一下system prompt: self.system_message = 'You are InternVL, a GUI agent. \n\nYou are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task.' 然后question直接传进来,整体的prompt如下: 'You are InternVL, a GUI agent. \n\nYou are given a task and a screenshot of the screen. You need to perform a series of pyautogui actions to complete the task. \n' + question

重点是需要把internvl3的system prompt给替换掉,如果不替换掉,大概就是70多的准确率。

sugarandgugu avatar Jul 23 '25 09:07 sugarandgugu

嗯,差不多,我复现出现81.9,还需要把最大的patch数改成12

chhluo avatar Jul 23 '25 09:07 chhluo

@liu-zhy 借楼问一下internvl3_5有gui agent的prompt模版吗?

YaguangGong avatar Sep 28 '25 09:09 YaguangGong

@liu-zhy 借楼问一下internvl3_5有gui agent的prompt模版吗?

有的兄弟,参考https://huggingface.co/OpenGVLab/ScaleCUA-3B

liu-zhy avatar Sep 29 '25 05:09 liu-zhy