UI-TARS icon indicating copy to clipboard operation
UI-TARS copied to clipboard

UI-TARS-1.5-7B element locator accuracy significantly degraded

Open NEOOOOOOOOOO opened this issue 8 months ago • 5 comments

When using the previous version of the model, UI-TARS-7B, for Web UI automation, element location was relatively accurate and stable. However, after upgrading to UI-TARS-1.5-7B, the accuracy of element positioning has significantly decreased. As a result, many of our existing automation test cases are no longer executable or fail frequently.

NEOOOOOOOOOO avatar Apr 22 '25 07:04 NEOOOOOOOOOO

We have updated the tutorial on coordinate processing.

JjjFangg avatar Apr 25 '25 03:04 JjjFangg

Thanks for your reply, I’ll give it a try later.

NEOOOOOOOOOO avatar Apr 25 '25 03:04 NEOOOOOOOOOO

@NEOOOOOOOOOO , do you know how to deploy UI-TARS-1.5-7B locally using vLLM? there is no documentation about local deployment. thanks

XingWang1234 avatar Apr 28 '25 07:04 XingWang1234

@XingWang1234

Install vllm via pip install vllm and start the model with Python:

CUDA_VISIBLE_DEVICES=0,1 nohup python -m vllm.entrypoints.openai.api_server \
  --served-model-name ui-tars \
  --model bytedance-research/UI-TARS-1.5-7B \
  --port 10006 \
  --tensor-parallel-size 2 \
  --gpu-memory-utilization 0.85 \
  --limit-mm-per-prompt "image=5" \
  --max-model-len 20000 \
  --disable-custom-all-reduce > ui-tars.log 2>&1 &

NEOOOOOOOOOO avatar Apr 28 '25 08:04 NEOOOOOOOOOO

@NEOOOOOOOOOO thank you so much.

XingWang1234 avatar Apr 28 '25 09:04 XingWang1234