text-generation-inference
text-generation-inference copied to clipboard
[Documentation] Unclear how to use other architectures
In your readme you list optimised arch and say
Other architectures are supported on a best effort basis using:
AutoModelForCausalLM.from_pretrained(
, device_map="auto")
or
AutoModelForSeq2SeqLM.from_pretrained(
, device_map="auto")
Can you explain where we have to do this? I'm trying to run baichuan-inc/baichuan-7B
model=baichuan-inc/baichuan-7B
num_shard=1
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 9090:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard
start.sh (END)
023-06-21T01:41:52.798477Z INFO text_generation_launcher: Args { model_id: "baichuan-inc/baichuan-7B", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false } 2023-06-21T01:41:52.798592Z INFO text_generation_launcher: Starting download process. 2023-06-21T01:41:56.934136Z WARN download: text_generation_launcher: No safetensors weights found for model baichuan-inc/baichuan-7B at revision None. Converting PyTorch weights to safetensors. 2023-06-21T01:41:56.934202Z INFO download: text_generation_launcher: Convert /data/models--baichuan-inc--baichuan-7B/snapshots/39916f64eb892ccdc1982b0eef845b3b8fd43f6b/pytorch_model.bin to /data/models--baichuan-inc--baichuan-7B/snapshots/39916f64eb892ccdc1982b0eef845b3b8fd43f6b/model.safetensors. Error: DownloadError 2023-06-21T01:42:03.924426Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last): File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app()) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 151, in download_weights utils.convert_files(local_pt_files, local_st_files) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 84, in convert_files convert_file(pt_file, sf_file) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 62, in convert_file save_file(pt_state, str(sf_file), metadata={"format": "pt"}) File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 232, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata)
File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 394, in _flatten raise RuntimeError(
RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.layers.21.mlp.gate_proj.weight', 'model.layers. 8.mlp.up_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.17.mlp.gate_proj.weight', 'mode l.layer
Try with --auto-convert false.
This error happens when trying to convert to safetensors, but it shouldn't be required for non core models.
This model seems to be sharing it's gate_proj, however the modeling code doesn't reflect that: https://huggingface.co/baichuan-inc/baichuan-7B/blob/main/modeling_baichuan.py Not sure if it's intentional.
thanks @Narsil
tried auto-convert but its not in the args?
error: unexpected argument '--auto-convert' found
Usage: text-generation-launcher <--model-id <MODEL_ID>|--revision <REVISION>|--sharded <SHARDED>|--num-shard <NUM_SHARD>|--quantize <QUANTIZE>|--trust-remote-code|--max-concurrent-requests <MAX_CONCURRENT_REQUESTS>|--max-best-of <MAX_BEST_OF>|--max-stop-sequences <MAX_STOP_SEQUENCES>|--max-input-length <MAX_INPUT_LENGTH>|--max-total-tokens <MAX_TOTAL_TOKENS>|--max-batch-size <MAX_BATCH_SIZE>|--waiting-served-ratio <WAITING_SERVED_RATIO>|--max-batch-total-tokens <MAX_BATCH_TOTAL_TOKENS>|--max-waiting-tokens <MAX_WAITING_TOKENS>|--port <PORT>|--shard-uds-path <SHARD_UDS_PATH>|--master-addr <MASTER_ADDR>|--master-port <MASTER_PORT>|--huggingface-hub-cache <HUGGINGFACE_HUB_CACHE>|--weights-cache-override <WEIGHTS_CACHE_OVERRIDE>|--disable-custom-kernels|--json-output|--otlp-endpoint <OTLP_ENDPOINT>|--cors-allow-origin <CORS_ALLOW_ORIGIN>|--watermark-gamma <WATERMARK_GAMMA>|--watermark-delta <WATERMARK_DELTA>|--env>
yea, facing the same issue, the model gets converted to safe tensor and then then it messes it up, not really sure on how to figure that out
same, when I add --auto-convert false it says the argument isn't found but when I try to run without it, it tries to convert the model to safe tensors and it returns this error
2023-07-03T19:27:56.259279Z WARN download: text_generation_launcher: No safetensors weights found for model /data/falcon-7b-instruct at revision None. Converting PyTorch weights to safetensors.
Error: DownloadError
2023-07-03T19:29:55.441248Z ERROR text_generation_launcher: Download process was signaled to shutdown with signal 9:
same here with falcon model
I also got --auto-convert not an argument. Would love to be able to use text-generation-inference on models which can't be converted to safetensors
Did anyone figure this out on how to use other architectures?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.