text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

[Documentation] Unclear how to use other architectures

Open louis030195 opened this issue 2 years ago • 8 comments

In your readme you list optimised arch and say

Other architectures are supported on a best effort basis using:

AutoModelForCausalLM.from_pretrained(, device_map="auto")

or

AutoModelForSeq2SeqLM.from_pretrained(, device_map="auto")

Can you explain where we have to do this? I'm trying to run baichuan-inc/baichuan-7B

model=baichuan-inc/baichuan-7B
num_shard=1
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 9090:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard
start.sh (END)

023-06-21T01:41:52.798477Z INFO text_generation_launcher: Args { model_id: "baichuan-inc/baichuan-7B", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false } 2023-06-21T01:41:52.798592Z INFO text_generation_launcher: Starting download process. 2023-06-21T01:41:56.934136Z WARN download: text_generation_launcher: No safetensors weights found for model baichuan-inc/baichuan-7B at revision None. Converting PyTorch weights to safetensors. 2023-06-21T01:41:56.934202Z INFO download: text_generation_launcher: Convert /data/models--baichuan-inc--baichuan-7B/snapshots/39916f64eb892ccdc1982b0eef845b3b8fd43f6b/pytorch_model.bin to /data/models--baichuan-inc--baichuan-7B/snapshots/39916f64eb892ccdc1982b0eef845b3b8fd43f6b/model.safetensors. Error: DownloadError 2023-06-21T01:42:03.924426Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last): File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app()) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 151, in download_weights utils.convert_files(local_pt_files, local_st_files) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 84, in convert_files convert_file(pt_file, sf_file) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 62, in convert_file save_file(pt_state, str(sf_file), metadata={"format": "pt"}) File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 232, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata)
File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 394, in _flatten raise RuntimeError(

RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.layers.21.mlp.gate_proj.weight', 'model.layers. 8.mlp.up_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.17.mlp.gate_proj.weight', 'mode l.layer

louis030195 avatar Jun 21 '23 01:06 louis030195

Try with --auto-convert false.

This error happens when trying to convert to safetensors, but it shouldn't be required for non core models.

Narsil avatar Jun 21 '23 10:06 Narsil

This model seems to be sharing it's gate_proj, however the modeling code doesn't reflect that: https://huggingface.co/baichuan-inc/baichuan-7B/blob/main/modeling_baichuan.py Not sure if it's intentional.

Narsil avatar Jun 21 '23 10:06 Narsil

thanks @Narsil

tried auto-convert but its not in the args?

error: unexpected argument '--auto-convert' found

Usage: text-generation-launcher <--model-id <MODEL_ID>|--revision <REVISION>|--sharded <SHARDED>|--num-shard <NUM_SHARD>|--quantize <QUANTIZE>|--trust-remote-code|--max-concurrent-requests <MAX_CONCURRENT_REQUESTS>|--max-best-of <MAX_BEST_OF>|--max-stop-sequences <MAX_STOP_SEQUENCES>|--max-input-length <MAX_INPUT_LENGTH>|--max-total-tokens <MAX_TOTAL_TOKENS>|--max-batch-size <MAX_BATCH_SIZE>|--waiting-served-ratio <WAITING_SERVED_RATIO>|--max-batch-total-tokens <MAX_BATCH_TOTAL_TOKENS>|--max-waiting-tokens <MAX_WAITING_TOKENS>|--port <PORT>|--shard-uds-path <SHARD_UDS_PATH>|--master-addr <MASTER_ADDR>|--master-port <MASTER_PORT>|--huggingface-hub-cache <HUGGINGFACE_HUB_CACHE>|--weights-cache-override <WEIGHTS_CACHE_OVERRIDE>|--disable-custom-kernels|--json-output|--otlp-endpoint <OTLP_ENDPOINT>|--cors-allow-origin <CORS_ALLOW_ORIGIN>|--watermark-gamma <WATERMARK_GAMMA>|--watermark-delta <WATERMARK_DELTA>|--env>

louis030195 avatar Jun 22 '23 00:06 louis030195

yea, facing the same issue, the model gets converted to safe tensor and then then it messes it up, not really sure on how to figure that out

mantrakp04 avatar Jun 23 '23 21:06 mantrakp04

same, when I add --auto-convert false it says the argument isn't found but when I try to run without it, it tries to convert the model to safe tensors and it returns this error

2023-07-03T19:27:56.259279Z  WARN download: text_generation_launcher: No safetensors weights found for model /data/falcon-7b-instruct at revision None. Converting PyTorch weights to safetensors.

Error: DownloadError
2023-07-03T19:29:55.441248Z ERROR text_generation_launcher: Download process was signaled to shutdown with signal 9:

ByteEvangelist avatar Jul 03 '23 19:07 ByteEvangelist

same here with falcon model

TalhaUusuf avatar Jul 05 '23 19:07 TalhaUusuf

I also got --auto-convert not an argument. Would love to be able to use text-generation-inference on models which can't be converted to safetensors

bealbrown avatar Jul 05 '23 20:07 bealbrown

Did anyone figure this out on how to use other architectures?

shannonphu avatar Nov 16 '23 03:11 shannonphu

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jul 12 '24 01:07 github-actions[bot]