text-generation-inference [Documentation] Unclear how to use other architectures

In your readme you list optimised arch and say

Other architectures are supported on a best effort basis using:

AutoModelForCausalLM.from_pretrained(, device_map="auto")

or

AutoModelForSeq2SeqLM.from_pretrained(, device_map="auto")

Can you explain where we have to do this? I'm trying to run baichuan-inc/baichuan-7B

model=baichuan-inc/baichuan-7B
num_shard=1
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 9090:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard
start.sh (END)

023-06-21T01:41:52.798477Z INFO text_generation_launcher: Args { model_id: "baichuan-inc/baichuan-7B", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false } 2023-06-21T01:41:52.798592Z INFO text_generation_launcher: Starting download process. 2023-06-21T01:41:56.934136Z WARN download: text_generation_launcher: No safetensors weights found for model baichuan-inc/baichuan-7B at revision None. Converting PyTorch weights to safetensors. 2023-06-21T01:41:56.934202Z INFO download: text_generation_launcher: Convert /data/models--baichuan-inc--baichuan-7B/snapshots/39916f64eb892ccdc1982b0eef845b3b8fd43f6b/pytorch_model.bin to /data/models--baichuan-inc--baichuan-7B/snapshots/39916f64eb892ccdc1982b0eef845b3b8fd43f6b/model.safetensors. Error: DownloadError 2023-06-21T01:42:03.924426Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last): File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app()) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 151, in download_weights utils.convert_files(local_pt_files, local_st_files) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 84, in convert_files convert_file(pt_file, sf_file) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 62, in convert_file save_file(pt_state, str(sf_file), metadata={"format": "pt"}) File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 232, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata)
File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 394, in _flatten raise RuntimeError(

RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.layers.21.mlp.gate_proj.weight', 'model.layers. 8.mlp.up_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.17.mlp.gate_proj.weight', 'mode l.layer

Jun 21 '23 01:06 louis030195

Try with --auto-convert false.

This error happens when trying to convert to safetensors, but it shouldn't be required for non core models.

Jun 21 '23 10:06 Narsil

This model seems to be sharing it's gate_proj, however the modeling code doesn't reflect that: https://huggingface.co/baichuan-inc/baichuan-7B/blob/main/modeling_baichuan.py Not sure if it's intentional.

Jun 21 '23 10:06 Narsil

thanks @Narsil

tried auto-convert but its not in the args?

error: unexpected argument '--auto-convert' found

Usage: text-generation-launcher <--model-id <MODEL_ID>|--revision <REVISION>|--sharded <SHARDED>|--num-shard <NUM_SHARD>|--quantize <QUANTIZE>|--trust-remote-code|--max-concurrent-requests <MAX_CONCURRENT_REQUESTS>|--max-best-of <MAX_BEST_OF>|--max-stop-sequences <MAX_STOP_SEQUENCES>|--max-input-length <MAX_INPUT_LENGTH>|--max-total-tokens <MAX_TOTAL_TOKENS>|--max-batch-size <MAX_BATCH_SIZE>|--waiting-served-ratio <WAITING_SERVED_RATIO>|--max-batch-total-tokens <MAX_BATCH_TOTAL_TOKENS>|--max-waiting-tokens <MAX_WAITING_TOKENS>|--port <PORT>|--shard-uds-path <SHARD_UDS_PATH>|--master-addr <MASTER_ADDR>|--master-port <MASTER_PORT>|--huggingface-hub-cache <HUGGINGFACE_HUB_CACHE>|--weights-cache-override <WEIGHTS_CACHE_OVERRIDE>|--disable-custom-kernels|--json-output|--otlp-endpoint <OTLP_ENDPOINT>|--cors-allow-origin <CORS_ALLOW_ORIGIN>|--watermark-gamma <WATERMARK_GAMMA>|--watermark-delta <WATERMARK_DELTA>|--env>

Jun 22 '23 00:06 louis030195

yea, facing the same issue, the model gets converted to safe tensor and then then it messes it up, not really sure on how to figure that out

Jun 23 '23 21:06 mantrakp04

same, when I add --auto-convert false it says the argument isn't found but when I try to run without it, it tries to convert the model to safe tensors and it returns this error

2023-07-03T19:27:56.259279Z  WARN download: text_generation_launcher: No safetensors weights found for model /data/falcon-7b-instruct at revision None. Converting PyTorch weights to safetensors.

Error: DownloadError
2023-07-03T19:29:55.441248Z ERROR text_generation_launcher: Download process was signaled to shutdown with signal 9: