text-generation-inference
text-generation-inference copied to clipboard
Shared volume using mountpoint-s3, permissions issues
System Info
INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: 4ee0a0c4010b6e000f176977648aa1749339e8cb
Docker label: sha-4ee0a0c
nvidia-smi:
Fri Apr 26 08:28:07 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P0 34W / 70W | 1270MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
2024-04-26T08:28:07.895343Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "text-generation-inference-77c86c8c47-msfnf", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: true }```
Running in EKS 1.28
### Information
- [ ] Docker
- [ ] The CLI directly
### Tasks
- [ ] An officially supported command
- [ ] My own modifications
### Reproduction
Hello,
I wanted to use [mountpoint-s3](https://github.com/awslabs/mountpoint-s3-csi-driver) in order to have a shared existing volume that stores all the models and not having to download them each time a pod runs.
Yes, I'm running on Kubernetes (EKS) and when I run the command under the pod it throws me these errors:
```console
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# text-generation-launcher --model-id=mistralai/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
2024-04-26T08:24:24.320980Z INFO text_generation_launcher: Args { model_id: "mistralai/Mistral-7B-Instruct-v0.2", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(BitsandbytesNF4), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "text-generation-inference-77c86c8c47-msfnf", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false }
2024-04-26T08:24:24.321455Z INFO download: text_generation_launcher: Starting download process.
2024-04-26T08:24:37.101264Z ERROR download: text_generation_launcher: Download encountered an error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status
response.raise_for_status()
File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/adapter_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1247, in hf_hub_download
metadata = get_hf_file_metadata(
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1624, in get_hf_file_metadata
r = _request_wrapper(
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 402, in _request_wrapper
response = _request_wrapper(
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 426, in _request_wrapper
hf_raise_for_status(response)
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 280, in hf_raise_for_status
raise EntryNotFoundError(message, response) from e
huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-662b64c2-28f5f303465cdcad4d92d8da;37a2346f-db23-40c5-99b8-dfacd1858ec4)
Entry Not Found for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/adapter_config.json.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 141, in download_weights
adapter_config_filename = hf_hub_download(
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1259, in hf_hub_download
no_exist_file_path.touch()
File "/opt/conda/lib/python3.10/pathlib.py", line 1168, in touch
self._accessor.touch(self, mode, exist_ok)
File "/opt/conda/lib/python3.10/pathlib.py", line 331, in touch
fd = os.open(path, flags, mode)
PermissionError: [Errno 1] Operation not permitted: '/data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/adapter_config.json'
Error: DownloadError
That's strange because it seems that the root
user is allowed to do anything. I tried to create, delete files. The unique thing it can't do right now is to change existing permissions.
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# echo "eaibe" >> /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# cat /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
eaibe
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# rm /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# echo "eaibe" >> /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# chmod 777 /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
chmod: changing permissions of '/data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar': Operation not permitted
First question, does the application run using another user? It doesn't seem to. Do yo see any reasons for this behavior?
Expected behavior
Downloading and running the application
Maybe that would be better to use this volume in read only. So I would just need to make them available in the bucket before starting the process? Could you please guide me in identifying the procedure to provision the s3 bucket?
Thanks :)
Ok I managed to do what I want:
- cloning the model
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
- sync the model in s3
aws s3 sync Mistral-7B-Instruct-v0.2 s3://<bucket_name>/Mistral-7B-Instruct-v0.2
- use it under the pod
text-generation-launcher --model-id=/data/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
2024-04-26T12:57:37.746280Z INFO text_generation_launcher: Args { model_id: "/data/Mistral-7B-Instruct-v0.2", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(BitsandbytesNF4), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "text-generation-inference-58d9869995-gxzx2", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false }
2024-04-26T12:57:37.746720Z INFO download: text_generation_launcher: Starting download process.
2024-04-26T12:57:48.114689Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-04-26T12:57:50.144159Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-26T12:57:50.144763Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-26T12:58:00.242683Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-26T12:58:02.873865Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
rank=0
2024-04-26T12:58:02.873894Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 9 rank=0
2024-04-26T12:58:02.944252Z ERROR text_generation_launcher: Shard 0 failed to start
2024-04-26T12:58:02.944282Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
I have another error that might not be related. I'm gonna solve that before closing this issue
Ok my first issue was caused by insufficient memory allocation. Now I got this error
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
Well I managed to download the model using the recommanded way with huggingface-cli
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2
aws s3 sync /home/smana/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.2 s3://<bucket>/models--mistralai--Mistral-7B-Instruct-v0.2
When the pod starts I still have permissions errors :/
text-generation-launcher --model-id=mistralai/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
...
2024-04-26T15:37:48.725974Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
...
PermissionError: [Errno 1] Operation not permitted: '/data/models--mistralai--Mistral-7B-Instruct-v0.2/tmp_7e2fd113-2af9-4a1a-bf0e-22d328d4bc8b'
It is working much better with an EFS storage, but I let this issue open in case someone is able to find out a solution for the S3 mountpoint.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.