text-generation-inference Shared volume using mountpoint-s3, permissions issues

System Info

INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: 4ee0a0c4010b6e000f176977648aa1749339e8cb
Docker label: sha-4ee0a0c
nvidia-smi:
Fri Apr 26 08:28:07 2024       
   +---------------------------------------------------------------------------------------+
   | NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
   |-----------------------------------------+----------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
   |                                         |                      |               MIG M. |
   |=========================================+======================+======================|
   |   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
   | N/A   34C    P0              34W /  70W |   1270MiB / 15360MiB |      0%      Default |
   |                                         |                      |                  N/A |
   +-----------------------------------------+----------------------+----------------------+
                                                                                            
   +---------------------------------------------------------------------------------------+
   | Processes:                                                                            |
   |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
   |        ID   ID                                                             Usage      |
   |=======================================================================================|
   +---------------------------------------------------------------------------------------+
2024-04-26T08:28:07.895343Z  INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "text-generation-inference-77c86c8c47-msfnf", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: true }```

Running in EKS 1.28

### Information

- [ ] Docker
- [ ] The CLI directly

### Tasks

- [ ] An officially supported command
- [ ] My own modifications

### Reproduction

Hello,

I wanted to use [mountpoint-s3](https://github.com/awslabs/mountpoint-s3-csi-driver) in order to have a shared existing volume that stores all the models and not having to download them each time a pod runs.

Yes, I'm running on Kubernetes (EKS) and when I run the command under the pod it throws me these errors:
```console
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# text-generation-launcher --model-id=mistralai/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
2024-04-26T08:24:24.320980Z  INFO text_generation_launcher: Args { model_id: "mistralai/Mistral-7B-Instruct-v0.2", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(BitsandbytesNF4), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "text-generation-inference-77c86c8c47-msfnf", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false }
2024-04-26T08:24:24.321455Z  INFO download: text_generation_launcher: Starting download process.
2024-04-26T08:24:37.101264Z ERROR download: text_generation_launcher: Download encountered an error: 
Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status
    response.raise_for_status()

  File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/adapter_config.json


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1247, in hf_hub_download
    metadata = get_hf_file_metadata(

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1624, in get_hf_file_metadata
    r = _request_wrapper(

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 402, in _request_wrapper
    response = _request_wrapper(

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 426, in _request_wrapper
    hf_raise_for_status(response)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 280, in hf_raise_for_status
    raise EntryNotFoundError(message, response) from e

huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-662b64c2-28f5f303465cdcad4d92d8da;37a2346f-db23-40c5-99b8-dfacd1858ec4)

Entry Not Found for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/adapter_config.json.


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 141, in download_weights
    adapter_config_filename = hf_hub_download(

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1259, in hf_hub_download
    no_exist_file_path.touch()

  File "/opt/conda/lib/python3.10/pathlib.py", line 1168, in touch
    self._accessor.touch(self, mode, exist_ok)

  File "/opt/conda/lib/python3.10/pathlib.py", line 331, in touch
    fd = os.open(path, flags, mode)

PermissionError: [Errno 1] Operation not permitted: '/data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/adapter_config.json'

Error: DownloadError

That's strange because it seems that the root user is allowed to do anything. I tried to create, delete files. The unique thing it can't do right now is to change existing permissions.

root@text-generation-inference-77c86c8c47-msfnf:/usr/src# echo "eaibe" >> /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# cat /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
eaibe
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# rm /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# echo "eaibe" >> /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
root@text-generation-inference-77c86c8c47-msfnf:/usr/src# chmod 777 /data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar
chmod: changing permissions of '/data/models--mistralai--Mistral-7B-Instruct-v0.2/.no_exist/41b61a33a2483885c981aa79e0df6b32407ed873/foobar': Operation not permitted

First question, does the application run using another user? It doesn't seem to. Do yo see any reasons for this behavior?

Expected behavior

Downloading and running the application

Apr 26 '24 08:04 Smana

Maybe that would be better to use this volume in read only. So I would just need to make them available in the bucket before starting the process? Could you please guide me in identifying the procedure to provision the s3 bucket?

Thanks :)

Apr 26 '24 10:04 Smana

Ok I managed to do what I want:

cloning the model

git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

sync the model in s3

aws s3 sync Mistral-7B-Instruct-v0.2 s3://<bucket_name>/Mistral-7B-Instruct-v0.2

use it under the pod

text-generation-launcher --model-id=/data/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
2024-04-26T12:57:37.746280Z  INFO text_generation_launcher: Args { model_id: "/data/Mistral-7B-Instruct-v0.2", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(BitsandbytesNF4), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "text-generation-inference-58d9869995-gxzx2", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false }
2024-04-26T12:57:37.746720Z  INFO download: text_generation_launcher: Starting download process.
2024-04-26T12:57:48.114689Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-04-26T12:57:50.144159Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-26T12:57:50.144763Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-26T12:58:00.242683Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-26T12:58:02.873865Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
 rank=0
2024-04-26T12:58:02.873894Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 9 rank=0
2024-04-26T12:58:02.944252Z ERROR text_generation_launcher: Shard 0 failed to start
2024-04-26T12:58:02.944282Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

I have another error that might not be related. I'm gonna solve that before closing this issue

Apr 26 '24 12:04 Smana

Ok my first issue was caused by insufficient memory allocation. Now I got this error

safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

Apr 26 '24 13:04 Smana

Well I managed to download the model using the recommanded way with huggingface-cli

huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2
aws s3 sync /home/smana/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.2 s3://<bucket>/models--mistralai--Mistral-7B-Instruct-v0.2

When the pod starts I still have permissions errors :/

text-generation-launcher --model-id=mistralai/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
...
2024-04-26T15:37:48.725974Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
...
PermissionError: [Errno 1] Operation not permitted: '/data/models--mistralai--Mistral-7B-Instruct-v0.2/tmp_7e2fd113-2af9-4a1a-bf0e-22d328d4bc8b'

Apr 26 '24 15:04 Smana

It is working much better with an EFS storage, but I let this issue open in case someone is able to find out a solution for the S3 mountpoint.

Apr 29 '24 15:04 SmaineTF1

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

May 30 '24 01:05 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

Shared volume using mountpoint-s3, permissions issues

System Info

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard