LLongMA-2-13b · Hugging Face - No suitable name to keep for saving
System Info
OS Version: Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal
8 A-100 GPUS
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
I'm running docker run --gpus all --shm-size 1g -p 8070:80 -v $volume:/data -e HUGGING_FACE_HUB_TOKEN=<my token> ghcr.io/huggingface/text-generation-inference:0.9.3 --model-id $model --num-shard $num_shard --max-input-length 4000 --max-total-tokens 4096
I'm running this on the latest commit: https://github.com/huggingface/text-generation-inference/commit/1da642bd0e6de28ef499f17cd226264f3ccdc824
Expected behavior
I'm running into this error for the LLongMA-2-13B model on hugging face.
2023-07-26T16:39:04.463956Z ERROR download: text_generation_launcher: Download encountered an error: The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache().
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 182, in download_weights utils.convert_files(local_pt_files, local_st_files, discard_names)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 106, in convert_files convert_file(pt_file, sf_file, discard_names)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 68, in convert_file to_removes = _remove_duplicate_names(loaded, discard_names=discard_names)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 32, in _remove_duplicate_names raise RuntimeError(
RuntimeError: Error while trying to find names to remove to save state dict, but found no suitable name to keep for saving amongst: {'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.post_attention_layernorm.weight', 'model.layers.4.input_layernorm.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.22.post_attention_layernorm.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.24.post_attention_layernorm.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.10.input_layernorm.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.7.mlp.gate_proj.weight', ...]
Any idea how that file was created and by whom ?
This model uses a single storage as backend for all the tensors. Something like
A = torch.zeros((26_000_000_000,))
q_proj = A[:1024 * 1024]
k_proj = A[1024*1024: 1024*1024 *2]
...
This throws off safetensors which cannot save shared tensors (lots of caveats associated with them).
An easy fix would be to call weights = {k: v.contiguous() for k, v in weights.items()} whereever.
This would work here but it's not really a valid general solution (same caveats).
@conceptofmind I think
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.