yikangshao comments

Results 3 comments of


                                            yikangshao

v1.5版本的7B模型在element_ocr场景下大幅低于v1版本的2B模型，是否符合预期

> 您有测试过官方的[tutorial](https://github.com/bytedance/UI-TARS/blob/main/README_coordinates.md)嘛在实际使用的时候需要确保输入模型的分辨率和后处理时完全一致（因为1.5采用的是绝对坐标，所以分辨率不一致影响会很大这和1.0有比较大的差别）你好，我对这个绝对坐标有一点疑问：一、我使用如下代码测出来结果比较接近 `model = Qwen2_5_VLForConditionalGeneration.from_pretrained( qwen_path, torch_dtype=torch.bfloat16, # attn_implementation="flash_attention_2", device_map="cuda" ) processor = AutoProcessor.from_pretrained(qwen_path) image = Image.open(img_path) inputs = processor( text=[text], images=[image], padding=True, return_tensors="pt", ).to('cuda') output_ids...

click pos incorrect, expect number but letter given at "start_box" in output

Causing the desktop to be unable to locate the correct position and cannot click, task cannot be executed correctly

fails to output coordinates

same bug at ui-tars-72b-dpo, and it happen frequently.