text-generation-inference Process hangs in local run

(text-generation-inference) [email protected]:~/tgi_test/text-generation-inference$ text-generation-launcher

2024-04-29T11:11:11.331114Z  INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_tokens: None, max_input_length: None, max_total_tokens: None, waiting_served_ratio: 1.2, max_batch_prefill_tokens: None, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, cuda_graphs: None, hostname: "0.0.0.0", port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false, max_client_batch_size: 4 }
2024-04-29T11:11:11.331492Z  INFO text_generation_launcher: Default `max_input_tokens` to 4095
2024-04-29T11:11:11.331507Z  INFO text_generation_launcher: Default `max_total_tokens` to 4096
2024-04-29T11:11:11.331516Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
2024-04-29T11:11:11.331530Z  INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-04-29T11:11:11.331725Z  INFO download: text_generation_launcher: Starting download process.
2024-04-29T11:11:15.228645Z  INFO text_generation_launcher: Download file: model.safetensors

I tried to run text generation inference locally, but the process hangs. What is usually the cause of this problem? For your information, all the args are default.

Apr 29 '24 11:04 Hojun-Son

hi @Hojun-Son I just ran the same command and was able to start a server. It may be a latent networking issue with downloading the model.

Also please make sure to specify the model id and other parameters at startup

I'd recommend downloading a model first via text-generation-server download-weights HuggingFaceM4/idefics2-8b

and then running it via text-generation-launcher --model-id HuggingFaceM4/idefics2-8b

I hope these commands work for you

Apr 29 '24 19:04 drbh

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

May 30 '24 01:05 github-actions[bot]