text-generation-inference
text-generation-inference copied to clipboard
Downloading stuck for some models
System Info
Using latest docker
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
For some models from the hub, I run consistently into this error - if HF_HUB_ENABLE_HF_TRANSFER is enabled, first one, if i disable it, download just gets stuck. It runs for a while and there is network traffic but afterwards it just hangs.
sudo docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard --quantize bitsandbytes
2023-05-22T16:52:14.163979Z INFO text_generation_launcher: Args { model_id: "OpenAssistant/stablelm-7b-sft-v7-epoch-3", revision: None, sharded: None, num_shard: Some(1), quantize: Some(Bitsandbytes), max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false } 2023-05-22T16:52:14.164066Z INFO text_generation_launcher: Starting download process. 2023-05-22T16:52:22.052758Z WARN download: text_generation_launcher: No safetensors weights found for model OpenAssistant/stablelm-7b-sft-v7-epoch-3 at revision None. Downloading PyTorch weights.
2023-05-22T16:52:22.278050Z INFO download: text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin
Error: DownloadError 2023-05-22T17:07:03.022039Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 486, in http_get download(url, temp_file.name, max_files, chunk_size, headers=headers)
Exception: Error while downloading: reqwest::Error { kind: Body, source: hyper::Error(Body, Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }) }
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 136, in download_weights local_pt_files = utils.download_weights(pt_filenames, model_id, revision)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 156, in download_weights file = download_file(filename)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 141, in download_file local_file = hf_hub_download(
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn return fn(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1347, in hf_hub_download http_get(
File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 495, in http_get raise RuntimeError(
RuntimeError: An error occurred while downloading using hf_transfer
. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
sudo docker run -e HF_HUB_ENABLE_HF_TRANSFER=False --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard --quantize bitsandbytes
2023-05-22T16:02:24.868158Z INFO text_generation_launcher: Args { model_id: "OpenAssistant/stablelm-7b-sft-v7-epoch-3", revision: None, sharded: None, num_shard: Some(1), quantize: Some(Bitsandbytes), max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false } 2023-05-22T16:02:24.868614Z INFO text_generation_launcher: Starting download process. 2023-05-22T16:02:33.534401Z WARN download: text_generation_launcher: No safetensors weights found for model OpenAssistant/stablelm-7b-sft-v7-epoch-3 at revision None. Downloading PyTorch weights.
2023-05-22T16:02:33.969078Z INFO download: text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin
^C2023-05-22T16:52:02.995872Z INFO text_generation_launcher: Waiting for download process to gracefully shutdown 2023-05-22T16:52:03.027326Z INFO text_generation_launcher: Download process terminated 2023-05-22T16:52:03.027386Z INFO text_generation_launcher: Shutting down shards 2023-05-22T16:52:03.027486Z INFO text_generation_launcher: Starting shard 0 2023-05-22T16:52:03.029556Z INFO text_generation_launcher: Shard 0 terminated
Expected behavior
weights should download properly. the exact same command with other model_ids works perfectly.
Can you try downloading weights directly ?
text-generation-server download-weights $model
Maybe providing the model id if it's public ? It could be a model specific issue.
Can I do that that with docker? Unfortunately I couldn't get it running with docker and I was unable to build the server locally.
model-id is OpenAssistant/stablelm-7b-sft-v7-epoch-3, but the same has been happening with various other models as well (and some others are working)
Is it possible that the ones that are failing are private ? If yes you need to set HUGGING_FACE_HUB_TOKEN
.
docker run -e HUGGING_FACE_HUB_TOKEN=.. ...rest
I had the same problem, it solved it for me using --net=host:
sudo docker run --net=host --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard --quantize bitsandbytes
how can i solve this problem? i am using aws-sagemaker instance.
On my sagemaker instance, I ran out of space and had to change the transformers cache directory
Ok I will close this since the issue seems solved, we can keep adding comments if other potential solutions are found.
We are seeing the same issue with vilsonrodrigues/falcon-7b-instruct-sharded. Always fails downloading at the exact same spot.
write(2, "\rDownloading (\342\200\246)of-00015.safet"..., 97