UI-TARS-1.5-7B would not output bounding box
I notice that for UI-TARS-1.5-7B, the model would not output bounding box for an element, even with explicit prompt asking for a bounding box.
Is it because the training of UI-TARS-1.5-7B uses the point instead of bbox, extensively and exclusively?
I notice a change in the format in the prompt used.
click(start_box='[x1, y1, x2, y2]') https://github.com/xlang-ai/OSWorld/commit/0bc1e084400e101848dcc48893bf24a0f9e6db2f https://github.com/bytedance/UI-TARS-desktop/blob/fba1e6bd6de2520043ee1b07a05be2e9f23d1e9a/packages/ui-tars/sdk/src/constants.ts
click(start_box='<|box_start|>(x1,y1)<|box_end|>') https://github.com/bytedance/UI-TARS-desktop/blob/main/apps/ui-tars/src/main/agent/prompts.ts https://github.com/bytedance/UI-TARS/blob/main/prompts.py
Yes, UI-TARS-1.5-7B has been trained to allow output only in the form of points.