text-generation-inference LLongMA-2-13b · Hugging Face - No suitable name to keep for saving

System Info

OS Version: Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

8 A-100 GPUS

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

I'm running docker run --gpus all --shm-size 1g -p 8070:80 -v $volume:/data -e HUGGING_FACE_HUB_TOKEN=<my token> ghcr.io/huggingface/text-generation-inference:0.9.3 --model-id $model --num-shard $num_shard --max-input-length 4000 --max-total-tokens 4096

I'm running this on the latest commit: https://github.com/huggingface/text-generation-inference/commit/1da642bd0e6de28ef499f17cd226264f3ccdc824

Expected behavior

I'm running into this error for the LLongMA-2-13B model on hugging face.

2023-07-26T16:39:04.463956Z ERROR download: text_generation_launcher: Download encountered an error: The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache(). 0it [00:00, ?it/s] Traceback (most recent call last):

File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app())

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 182, in download_weights utils.convert_files(local_pt_files, local_st_files, discard_names)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 106, in convert_files convert_file(pt_file, sf_file, discard_names)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 68, in convert_file to_removes = _remove_duplicate_names(loaded, discard_names=discard_names)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 32, in _remove_duplicate_names raise RuntimeError(

RuntimeError: Error while trying to find names to remove to save state dict, but found no suitable name to keep for saving amongst: {'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.post_attention_layernorm.weight', 'model.layers.4.input_layernorm.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.22.post_attention_layernorm.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.24.post_attention_layernorm.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.10.input_layernorm.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.7.mlp.gate_proj.weight', ...]

Jul 26 '23 17:07 arnavsinghvi11

Any idea how that file was created and by whom ?

This model uses a single storage as backend for all the tensors. Something like

A = torch.zeros((26_000_000_000,))
q_proj = A[:1024 * 1024]
k_proj = A[1024*1024: 1024*1024 *2]
...

This throws off safetensors which cannot save shared tensors (lots of caveats associated with them).

An easy fix would be to call weights = {k: v.contiguous() for k, v in weights.items()} whereever. This would work here but it's not really a valid general solution (same caveats).

Jul 27 '23 11:07 Narsil

@conceptofmind I think

Jul 27 '23 11:07 Narsil

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 22 '24 01:04 github-actions[bot]