UI-TARS VRAM Requirements for 72b?

I just barely fit the 7B model on 17gb using vllm w/ fp8 kv cache. What would the requirements be for the 72b model?

Jan 23 '25 23:01 matbeedotcom

Hi, we cannot guarantee the model performance if you use fp8. If it is quite hard to deploy locally, you can follow our instruction to setup a cloud server in huggingface inference endpoints.

Jan 24 '25 01:01 AHEADer

I think the VRAM requirements for the 72b model would be more than 160G, for we are using bf16 precision and the model weights will take about 144G.

Jan 24 '25 01:01 AHEADer

I was able to get the 7b to load with full kv cache accuracy:

vllm serve bytedance-research/UI-TARS-7B-DPO --enforce-eager --gpu-memory-utilization 0.99 --limit-mm-per-prompt "image=30"

Have you found a method to allow the 72b model to run on CPU? I have an AMD Epyc build with 512gb ram. Sadly have not been able to find any reasonably priced GPU that would be feasible for ~200gb of vram

Jan 25 '25 19:01 matbeedotcom

Have you found a method to allow the 72b model to run on CPU? I have an AMD Epyc build with 512gb ram. Sadly have not been able to find any reasonably priced GPU that would be feasible for ~200gb of vram

Maybe take a look at using multiple Radeon RX 7900 XTX . Take a look at this repo and this article if it piques your interest.

Feb 13 '25 02:02 Etherdrake

I'm serving with 8 a100(80gb) GPUs, and max_token_len=16384. When I put in 5 images in chat template, the vllm server dies. Sometimes it doesn't die right away, but after a few steps it dies. Is this the case for everyone?

If you want to make an inference by inserting 5 history images like the paper says, how much GPU do you need for stability?

Mar 28 '25 07:03 kig1929