yikangshao

Results 3 comments of yikangshao

> 您有测试过官方的[tutorial](https://github.com/bytedance/UI-TARS/blob/main/README_coordinates.md)嘛 在实际使用的时候需要确保输入模型的分辨率和后处理时完全一致(因为1.5采用的是绝对坐标,所以分辨率不一致影响会很大 这和1.0有比较大的差别) 你好,我对这个绝对坐标有一点疑问: 一、我使用如下代码测出来结果比较接近 `model = Qwen2_5_VLForConditionalGeneration.from_pretrained( qwen_path, torch_dtype=torch.bfloat16, # attn_implementation="flash_attention_2", device_map="cuda" ) processor = AutoProcessor.from_pretrained(qwen_path) image = Image.open(img_path) inputs = processor( text=[text], images=[image], padding=True, return_tensors="pt", ).to('cuda') output_ids...

Causing the desktop to be unable to locate the correct position and cannot click, task cannot be executed correctly

same bug at ui-tars-72b-dpo, and it happen frequently.