Deployment fails with : ArgumentValidation("`max_input_tokens`(65537) must be < `max_total_tokens`(65536)")
: ArgumentValidation("max_input_tokens(65537) must be < max_total_tokens(65536)")
When deploying on huggingface with the parameters specified in the docs. Any idea how to fix it?
Endpoint encountered an error.
You can try restarting it using the "retry" button above. Check logs for more details.
[Server message]Endpoint failed to start
Exit code: 1. Reason: speculate: None,\n dtype: None,\n kv_cache_dtype: None,\n trust_remote_code: false,\n max_concurrent_requests: 128,\n max_best_of: 2,\n max_stop_sequences: 4,\n max_top_n_tokens: 5,\n max_input_tokens: None,\n max_input_length: Some(\n 65537,\n ),\n max_total_tokens: Some(\n 65536,\n ),\n waiting_served_ratio: 0.3,\n max_batch_prefill_tokens: Some(\n 65536,\n ),\n max_batch_total_tokens: None,\n max_waiting_tokens: 20,\n max_batch_size: None,\n cuda_graphs: Some(\n [\n 0,\n ],\n ),\n hostname: "r-facehuggerthesecond-ui-tars-1-5-7b-wmu-5lac85uc-08b4b-8ds4z",\n port: 80,\n shard_uds_path: "/tmp/text-generation-server",\n master_addr: "localhost",\n master_port: 29500,\n huggingface_hub_cache: Some(\n "/repository/cache",\n ),\n weights_cache_override: None,\n disable_custom_kernels: false,\n cuda_memory_fraction: 1.0,\n rope_scaling: None,\n rope_factor: None,\n json_output: true,\n otlp_endpoint: None,\n otlp_service_name: "text-generation-inference.router",\n cors_allow_origin: [],\n api_key: None,\n watermark_gamma: None,\n watermark_delta: None,\n ngrok: false,\n ngrok_authtoken: None,\n ngrok_edge: None,\n tokenizer_config_path: None,\n disable_grammar_support: false,\n env: false,\n max_client_batch_size: 4,\n lora_adapters: None,\n usage_stats: On,\n payload_limit: 8000000,\n enable_prefill_logprobs: false,\n graceful_termination_timeout: 90,\n}"},"target":"text_generation_launcher"}
{"timestamp":"2025-05-07T18:04:50.881716Z","level":"INFO","fields":{"message":"Disabling prefix caching because of VLM model"},"target":"text_generation_launcher"}
{"timestamp":"2025-05-07T18:04:50.881753Z","level":"INFO","fields":{"message":"Using attention flashinfer - Prefix caching 0"},"target":"text_generation_launcher"}
Error: ArgumentValidation("max_input_tokens(65537) must be < max_total_tokens(65536)")
https://github.com/bytedance/UI-TARS/pull/146
Yea, they wouldn't merge the pr.
Yea, they wouldn't merge the pr.
Hi, thanks for your contribution, the pull request has been merged!