UI-TARS icon indicating copy to clipboard operation
UI-TARS copied to clipboard

Wrong coordinates for very big screen images

Open tcnguyen opened this issue 7 months ago • 2 comments

Hello, We observed UITARS 1.5 7B giving wrong coordinates for big screen images (3840x2100). When we resized the images to be smaller it worked correctly.

What do you recommend for the max image sizes? Thank you.

tcnguyen avatar May 06 '25 11:05 tcnguyen

Thank you for reaching out.

This behavior does seem unusual. Our test set does include high-resolution images (including resolutions comparable to 3840×2100), and so far we have not observed significant grounding issues or systematic deviations in the model’s performance at larger image sizes.

To better understand the issue, could you kindly share some specific examples where the model produces incorrect coordinates on high-resolution input? This will help us investigate further and determine whether it’s input-dependent or resolution-sensitive.

JjjFangg avatar May 06 '25 13:05 JjjFangg

@JjjFangg Hi, thank you for your response. I'll prepare some examples and share with you.

tcnguyen avatar May 16 '25 13:05 tcnguyen