UI-TARS Deployment fails with : ArgumentValidation("`max_input_tokens`(65537) must be < `max_total

: ArgumentValidation("`max_input_tokens`(65537) must be < `max_total_tokens`(65536)")

When deploying on huggingface with the parameters specified in the docs. Any idea how to fix it?

Endpoint encountered an error. You can try restarting it using the "retry" button above. Check logs for more details. [Server message]Endpoint failed to start Exit code: 1. Reason: speculate: None,\n dtype: None,\n kv_cache_dtype: None,\n trust_remote_code: false,\n max_concurrent_requests: 128,\n max_best_of: 2,\n max_stop_sequences: 4,\n max_top_n_tokens: 5,\n max_input_tokens: None,\n max_input_length: Some(\n 65537,\n ),\n max_total_tokens: Some(\n 65536,\n ),\n waiting_served_ratio: 0.3,\n max_batch_prefill_tokens: Some(\n 65536,\n ),\n max_batch_total_tokens: None,\n max_waiting_tokens: 20,\n max_batch_size: None,\n cuda_graphs: Some(\n [\n 0,\n ],\n ),\n hostname: "r-facehuggerthesecond-ui-tars-1-5-7b-wmu-5lac85uc-08b4b-8ds4z",\n port: 80,\n shard_uds_path: "/tmp/text-generation-server",\n master_addr: "localhost",\n master_port: 29500,\n huggingface_hub_cache: Some(\n "/repository/cache",\n ),\n weights_cache_override: None,\n disable_custom_kernels: false,\n cuda_memory_fraction: 1.0,\n rope_scaling: None,\n rope_factor: None,\n json_output: true,\n otlp_endpoint: None,\n otlp_service_name: "text-generation-inference.router",\n cors_allow_origin: [],\n api_key: None,\n watermark_gamma: None,\n watermark_delta: None,\n ngrok: false,\n ngrok_authtoken: None,\n ngrok_edge: None,\n tokenizer_config_path: None,\n disable_grammar_support: false,\n env: false,\n max_client_batch_size: 4,\n lora_adapters: None,\n usage_stats: On,\n payload_limit: 8000000,\n enable_prefill_logprobs: false,\n graceful_termination_timeout: 90,\n}"},"target":"text_generation_launcher"} {"timestamp":"2025-05-07T18:04:50.881716Z","level":"INFO","fields":{"message":"Disabling prefix caching because of VLM model"},"target":"text_generation_launcher"} {"timestamp":"2025-05-07T18:04:50.881753Z","level":"INFO","fields":{"message":"Using attention flashinfer - Prefix caching 0"},"target":"text_generation_launcher"} Error: ArgumentValidation("max_input_tokens(65537) must be < max_total_tokens(65536)")

May 07 '25 18:05 troublesprouter

https://github.com/bytedance/UI-TARS/pull/146

Yea, they wouldn't merge the pr.

May 07 '25 22:05 matt-wai

#146

Yea, they wouldn't merge the pr.

Hi, thanks for your contribution, the pull request has been merged!

May 11 '25 09:05 Taoran-Lu

Deployment fails with : ArgumentValidation("`max_input_tokens`(65537) must be < `max_total_tokens`(65536)")

: ArgumentValidation("max_input_tokens(65537) must be < max_total_tokens(65536)")

: ArgumentValidation("`max_input_tokens`(65537) must be < `max_total_tokens`(65536)")