text-generation-inference .bin weights not found for model

System Info

i get this error after running this on docker https://huggingface.co/huggingface/falcon-40b-gptq?text=My+name+is+Lewis+and+I+like+to

huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model huggingface/falcon-40b-gptq and revision None.

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

sudo docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard --quantize gptq

Expected behavior

to run

Jun 16 '23 18:06 mayurtikundi12

Hi @mayurtikundi12 You need to work with latest for this model to work.

We're going to release 0.9 soon which should work. @OlivierDehaene (For vis)

Jun 19 '23 09:06 Narsil

With 1.1.0 it is not working,

model=sigmareaver/flan-ul2-4bit-128g-gptq

volume=$PWD/flan-ul2-4bit-128g-gptq-data

docker run --gpus all --shm-size 24g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.1.0 --model-id $model --max-total-tokens 5024 --max-input-length 4096 --num-shard 4 --max-concurrent-requests 128

Oct 09 '23 17:10 chintanckg

@chintanckg, add --quantize gptq

Oct 10 '23 13:10 OlivierDehaene

model=sigmareaver/flan-ul2-4bit-128g-gptq

volume=$PWD/flan-ul2-4bit-128g-gptq-data

docker run --gpus all --shm-size 24g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --max-total-tokens 5024 --max-input-length 4096 --num-shard 4 --max-concurrent-requests 128 --quantize gptq

Output:

Traceback (most recent call last):

 File "/opt/conda/bin/text-generation-server", line 8, in <module>
   sys.exit(app())

 File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 115, in download_weights
   utils.weight_files(model_id, revision, extension)

 File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 101, in weight_files
   pt_filenames = weight_hub_files(model_id, revision, extension=".bin")

 File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 37, in weight_hub_files
   raise EntryNotFoundError(

huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model sigmareaver/flan-ul2-4bit-128g-gptq and revision None.

@OlivierDehaene -- Still the same issue, please advise.

Oct 11 '23 11:10 chintanckg

I am also seeing this error when loading a model by path that has safetensors and not .bin weights.

TheBloke/Llama-2-7B-Chat-GPTQ

Nov 06 '23 16:11 parkerroan

@OlivierDehaene

https://github.com/huggingface/text-generation-inference/blob/96a982ad8fc232479384476b1596a880697cc1d0/server/text_generation_server/cli.py#L156

Shouldn't this line not have hardcoded ".bin" here? I think this could be the cause of this issue.

Because it then hits this block and raises this error:

https://github.com/huggingface/text-generation-inference/blob/96a982ad8fc232479384476b1596a880697cc1d0/server/text_generation_server/utils/hub.py#L95-L99

Nov 06 '23 16:11 parkerroan

Same issue with a .gguf finetuned model. Any updates?

Apr 09 '24 16:04 chumpblocckami

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Jul 22 '24 01:07 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

.bin weights not found for model

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard