text-generation-inference Downloading stuck for some models

System Info

Using latest docker

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

For some models from the hub, I run consistently into this error - if HF_HUB_ENABLE_HF_TRANSFER is enabled, first one, if i disable it, download just gets stuck. It runs for a while and there is network traffic but afterwards it just hangs.

sudo docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard --quantize bitsandbytes

2023-05-22T16:52:14.163979Z INFO text_generation_launcher: Args { model_id: "OpenAssistant/stablelm-7b-sft-v7-epoch-3", revision: None, sharded: None, num_shard: Some(1), quantize: Some(Bitsandbytes), max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false } 2023-05-22T16:52:14.164066Z INFO text_generation_launcher: Starting download process. 2023-05-22T16:52:22.052758Z WARN download: text_generation_launcher: No safetensors weights found for model OpenAssistant/stablelm-7b-sft-v7-epoch-3 at revision None. Downloading PyTorch weights.

2023-05-22T16:52:22.278050Z INFO download: text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin

Error: DownloadError 2023-05-22T17:07:03.022039Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last):

File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 486, in http_get download(url, temp_file.name, max_files, chunk_size, headers=headers)

Exception: Error while downloading: reqwest::Error { kind: Body, source: hyper::Error(Body, Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }) }

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app())

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 136, in download_weights local_pt_files = utils.download_weights(pt_filenames, model_id, revision)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 156, in download_weights file = download_file(filename)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 141, in download_file local_file = hf_hub_download(

File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn return fn(*args, **kwargs)

File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1347, in hf_hub_download http_get(

File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 495, in http_get raise RuntimeError(

RuntimeError: An error occurred while downloading using hf_transfer. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

sudo docker run -e HF_HUB_ENABLE_HF_TRANSFER=False --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard --quantize bitsandbytes

2023-05-22T16:02:24.868158Z INFO text_generation_launcher: Args { model_id: "OpenAssistant/stablelm-7b-sft-v7-epoch-3", revision: None, sharded: None, num_shard: Some(1), quantize: Some(Bitsandbytes), max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false } 2023-05-22T16:02:24.868614Z INFO text_generation_launcher: Starting download process. 2023-05-22T16:02:33.534401Z WARN download: text_generation_launcher: No safetensors weights found for model OpenAssistant/stablelm-7b-sft-v7-epoch-3 at revision None. Downloading PyTorch weights.

2023-05-22T16:02:33.969078Z INFO download: text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin

^C2023-05-22T16:52:02.995872Z INFO text_generation_launcher: Waiting for download process to gracefully shutdown 2023-05-22T16:52:03.027326Z INFO text_generation_launcher: Download process terminated 2023-05-22T16:52:03.027386Z INFO text_generation_launcher: Shutting down shards 2023-05-22T16:52:03.027486Z INFO text_generation_launcher: Starting shard 0 2023-05-22T16:52:03.029556Z INFO text_generation_launcher: Shard 0 terminated

Expected behavior

weights should download properly. the exact same command with other model_ids works perfectly.

May 22 '23 20:05 d0lphin

Can you try downloading weights directly ?

text-generation-server download-weights $model

Maybe providing the model id if it's public ? It could be a model specific issue.

May 23 '23 07:05 Narsil

Can I do that that with docker? Unfortunately I couldn't get it running with docker and I was unable to build the server locally.

model-id is OpenAssistant/stablelm-7b-sft-v7-epoch-3, but the same has been happening with various other models as well (and some others are working)

May 23 '23 13:05 d0lphin

Is it possible that the ones that are failing are private ? If yes you need to set HUGGING_FACE_HUB_TOKEN.

docker run -e HUGGING_FACE_HUB_TOKEN=..   ...rest

May 23 '23 13:05 Narsil

I had the same problem, it solved it for me using --net=host: sudo docker run --net=host --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard --quantize bitsandbytes

May 23 '23 20:05 DieguJota

how can i solve this problem? i am using aws-sagemaker instance.

Jul 28 '23 13:07 majidbhatti

On my sagemaker instance, I ran out of space and had to change the transformers cache directory

Aug 09 '23 06:08 Ram-Chandalada

Ok I will close this since the issue seems solved, we can keep adding comments if other potential solutions are found.

Aug 09 '23 07:08 Narsil

We are seeing the same issue with vilsonrodrigues/falcon-7b-instruct-sharded. Always fails downloading at the exact same spot.

write(2, "\rDownloading (\342\200\246)of-00015.safet"..., 97

Nov 16 '23 16:11 babeal

text-generation-inference text-generation-inference copied to clipboard

Downloading stuck for some models

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard