Process hangs in local run
(text-generation-inference) [email protected]:~/tgi_test/text-generation-inference$ text-generation-launcher
2024-04-29T11:11:11.331114Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_tokens: None, max_input_length: None, max_total_tokens: None, waiting_served_ratio: 1.2, max_batch_prefill_tokens: None, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, cuda_graphs: None, hostname: "0.0.0.0", port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false, max_client_batch_size: 4 }
2024-04-29T11:11:11.331492Z INFO text_generation_launcher: Default `max_input_tokens` to 4095
2024-04-29T11:11:11.331507Z INFO text_generation_launcher: Default `max_total_tokens` to 4096
2024-04-29T11:11:11.331516Z INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
2024-04-29T11:11:11.331530Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-04-29T11:11:11.331725Z INFO download: text_generation_launcher: Starting download process.
2024-04-29T11:11:15.228645Z INFO text_generation_launcher: Download file: model.safetensors
I tried to run text generation inference locally, but the process hangs. What is usually the cause of this problem? For your information, all the args are default.
hi @Hojun-Son I just ran the same command and was able to start a server. It may be a latent networking issue with downloading the model.
Also please make sure to specify the model id and other parameters at startup
I'd recommend downloading a model first via text-generation-server download-weights HuggingFaceM4/idefics2-8b
and then running it via text-generation-launcher --model-id HuggingFaceM4/idefics2-8b
I hope these commands work for you
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.