UI-TARS
UI-TARS copied to clipboard
UITARS prompt for visual grounding only
Currently the prompt need a task description and action history. If I want to use UITARS for visual grounding only, is this possible? What is the prompt you used for visual grounding benchmarking? Thank you.
Here are an example of visual grounding task.
"""You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. \n\n## Output Format\n\nAction: ...\n\n\n## Action Space\nclick(start_box='<|box_start|>(x1,y1)<|box_end|>')\n\n## User Instruction\n{instruction}
"""
@JjjFangg Thank you!