UI-TARS icon indicating copy to clipboard operation
UI-TARS copied to clipboard

关于training_example.json 复现

Open MickeyFei opened this issue 9 months ago • 3 comments

我找到了示例训练数据在mind2web中的原始内容,但是坐标计算对不上,请问可以分享下坐标计算方法嘛? 我理解training_example.json 的第一轮对话的坐标(495,30)是相对坐标,该条数据对应的图片分辨率是1280*720,计算出的绝对坐标是(633.6,21.6) 该条原始数据对应的第一个框的坐标是
"bbox": { "x": 595.25, "y": 0.0, "width": 118.15625, "height": 60.0 }, 框的中心点的绝对坐标为(654,30) 请问这样的误差是否正确呢,训练数据对框的处理是怎样的呢?

MickeyFei avatar Mar 04 '25 03:03 MickeyFei

All input data coordinates are normalized within the range of 0-1000 as relative coordinates, without involving absolute coordinate inputs.

JjjFangg avatar Mar 04 '25 03:03 JjjFangg

All input data coordinates are normalized within the range of 0-1000 as relative coordinates, without involving absolute coordinate inputs.

抱歉,说错了,是相对坐标,请问是如何处理得到框对应的的相对坐标呢?

MickeyFei avatar Mar 04 '25 03:03 MickeyFei

Consider an image with a resolution of 1920 × 1080 pixels, and a bounding box with absolute coordinates:

Corner Absolute Coordinates (X, Y)
Top-left (640, 270)
Bottom-right (1280, 810)

Formula for Relative Coordinates

x' = (x / W) * 1000
y' = (y / H) * 1000

Where:

  • x, y are the absolute coordinates.
  • W, H are the image width and height.
  • The multiplication by 1000 ensures normalized coordinates fall within [0, 1000].

Converted Relative Coordinates

Corner Calculation Relative Coordinates (X', Y')
Top-left (640 / 1920) * 1000, (270 / 1080) * 1000 (333.33, 250.00)
Bottom-right (1280 / 1920) * 1000, (810 / 1080) * 1000 (666.67, 750.00)

JjjFangg avatar Mar 05 '25 10:03 JjjFangg

Hi, sorry to bother you, but I've been following the algorithm you provided and I still can't figure out how you got the coordinates (495, 30).

@MickeyFei I was just wondering if the calculation method is clear to you.

zackschen avatar Jul 01 '25 08:07 zackschen