UI-TARS icon indicating copy to clipboard operation
UI-TARS copied to clipboard

Deployment fails with : ArgumentValidation("`max_input_tokens`(65537) must be < `max_total_tokens`(65536)")

Open troublesprouter opened this issue 7 months ago • 2 comments

: ArgumentValidation("max_input_tokens(65537) must be < max_total_tokens(65536)")

 

When deploying on huggingface with the parameters specified in the docs. Any idea how to fix it?

Endpoint encountered an error. You can try restarting it using the "retry" button above. Check logs for more details. [Server message]Endpoint failed to start Exit code: 1. Reason: speculate: None,\n dtype: None,\n kv_cache_dtype: None,\n trust_remote_code: false,\n max_concurrent_requests: 128,\n max_best_of: 2,\n max_stop_sequences: 4,\n max_top_n_tokens: 5,\n max_input_tokens: None,\n max_input_length: Some(\n 65537,\n ),\n max_total_tokens: Some(\n 65536,\n ),\n waiting_served_ratio: 0.3,\n max_batch_prefill_tokens: Some(\n 65536,\n ),\n max_batch_total_tokens: None,\n max_waiting_tokens: 20,\n max_batch_size: None,\n cuda_graphs: Some(\n [\n 0,\n ],\n ),\n hostname: "r-facehuggerthesecond-ui-tars-1-5-7b-wmu-5lac85uc-08b4b-8ds4z",\n port: 80,\n shard_uds_path: "/tmp/text-generation-server",\n master_addr: "localhost",\n master_port: 29500,\n huggingface_hub_cache: Some(\n "/repository/cache",\n ),\n weights_cache_override: None,\n disable_custom_kernels: false,\n cuda_memory_fraction: 1.0,\n rope_scaling: None,\n rope_factor: None,\n json_output: true,\n otlp_endpoint: None,\n otlp_service_name: "text-generation-inference.router",\n cors_allow_origin: [],\n api_key: None,\n watermark_gamma: None,\n watermark_delta: None,\n ngrok: false,\n ngrok_authtoken: None,\n ngrok_edge: None,\n tokenizer_config_path: None,\n disable_grammar_support: false,\n env: false,\n max_client_batch_size: 4,\n lora_adapters: None,\n usage_stats: On,\n payload_limit: 8000000,\n enable_prefill_logprobs: false,\n graceful_termination_timeout: 90,\n}"},"target":"text_generation_launcher"} {"timestamp":"2025-05-07T18:04:50.881716Z","level":"INFO","fields":{"message":"Disabling prefix caching because of VLM model"},"target":"text_generation_launcher"} {"timestamp":"2025-05-07T18:04:50.881753Z","level":"INFO","fields":{"message":"Using attention flashinfer - Prefix caching 0"},"target":"text_generation_launcher"} Error: ArgumentValidation("max_input_tokens(65537) must be < max_total_tokens(65536)")

troublesprouter avatar May 07 '25 18:05 troublesprouter

https://github.com/bytedance/UI-TARS/pull/146

Yea, they wouldn't merge the pr.

matt-wai avatar May 07 '25 22:05 matt-wai

#146

Yea, they wouldn't merge the pr.

Hi, thanks for your contribution, the pull request has been merged!

Taoran-Lu avatar May 11 '25 09:05 Taoran-Lu