text-generation-inference
text-generation-inference copied to clipboard
.bin weights not found for model
System Info
i get this error after running this on docker https://huggingface.co/huggingface/falcon-40b-gptq?text=My+name+is+Lewis+and+I+like+to
huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model huggingface/falcon-40b-gptq and revision None.
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
sudo docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard --quantize gptq
Expected behavior
to run
Hi @mayurtikundi12 You need to work with latest for this model to work.
We're going to release 0.9 soon which should work. @OlivierDehaene (For vis)
With 1.1.0 it is not working,
model=sigmareaver/flan-ul2-4bit-128g-gptq
volume=$PWD/flan-ul2-4bit-128g-gptq-data
docker run --gpus all --shm-size 24g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.1.0 --model-id $model --max-total-tokens 5024 --max-input-length 4096 --num-shard 4 --max-concurrent-requests 128
@chintanckg, add --quantize gptq
model=sigmareaver/flan-ul2-4bit-128g-gptq
volume=$PWD/flan-ul2-4bit-128g-gptq-data
docker run --gpus all --shm-size 24g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --max-total-tokens 5024 --max-input-length 4096 --num-shard 4 --max-concurrent-requests 128 --quantize gptq
Output:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 115, in download_weights
utils.weight_files(model_id, revision, extension)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 101, in weight_files
pt_filenames = weight_hub_files(model_id, revision, extension=".bin")
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 37, in weight_hub_files
raise EntryNotFoundError(
huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model sigmareaver/flan-ul2-4bit-128g-gptq and revision None.
@OlivierDehaene -- Still the same issue, please advise.
I am also seeing this error when loading a model by path that has safetensors and not .bin weights.
TheBloke/Llama-2-7B-Chat-GPTQ
@OlivierDehaene
https://github.com/huggingface/text-generation-inference/blob/96a982ad8fc232479384476b1596a880697cc1d0/server/text_generation_server/cli.py#L156
Shouldn't this line not have hardcoded ".bin" here? I think this could be the cause of this issue.
Because it then hits this block and raises this error:
https://github.com/huggingface/text-generation-inference/blob/96a982ad8fc232479384476b1596a880697cc1d0/server/text_generation_server/utils/hub.py#L95-L99
Same issue with a .gguf finetuned model. Any updates?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.