text-generation-webui
text-generation-webui copied to clipboard
Could not find the quantized model in .pt or .safetensors format, exiting...
Describe the bug
Could not find the quantized model in .pt or .safetensors format, exiting...
https://github.com/TimDettmers/bitsandbytes/issues/311
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
cp .env.example .env
docker compose up --build
Screenshot
No response
Logs
log
[+] Building 1253.0s (39/39) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.19kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 133B 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04 10.3s
=> [internal] load metadata for docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04 10.3s
=> [auth] nvidia/cuda:pull token for registry-1.docker.io 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 661.01kB 0.1s
=> [stage-1 1/24] FROM docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04@sha256:9b9ce0e128463d147a58b5013255c60e7eb725141f37c197b1ddee5aeb7e4161 121.4s
=> => resolve docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04@sha256:9b9ce0e128463d147a58b5013255c60e7eb725141f37c197b1ddee5aeb7e4161 4.4s
=> => sha256:1df9aa4f83a99b4c7df79234a770ad6313a54a56ed98d1df30d95efbba57bf5c 14.01kB / 14.01kB 0.0s
=> => sha256:bc572704fd2210f6f004e9ee692d23852aa450426fa0d53f79a6f8c48cc97933 4.60MB / 4.60MB 5.1s
=> => sha256:9b9ce0e128463d147a58b5013255c60e7eb725141f37c197b1ddee5aeb7e4161 743B / 743B 0.0s
=> => sha256:cf5ef7c8352acc84dcb538a991f05f23ffe4cecc3300212bc921e749b7297433 2.21kB / 2.21kB 0.0s
=> => sha256:677076032cca0a2362d25cf3660072e738d1b96fe860409a33ce901d695d7ee8 29.53MB / 29.53MB 6.4s
=> => sha256:82ca2dd0fe9d6f9d2863bba4353929b5758edef368903821e269f324f10b346d 56.23MB / 56.23MB 7.9s
=> => sha256:335006729f702b972677393d68888e7296f44146c9cbbeb2373dfba63c98702c 188B / 188B 7.1s
=> => sha256:1b9f8e302abf3e7cff2990322faee0d675fe3bf2577c9ececa08d197692e1bd9 6.43kB / 6.43kB 7.4s
=> => extracting sha256:677076032cca0a2362d25cf3660072e738d1b96fe860409a33ce901d695d7ee8 0.8s
=> => sha256:120deaf0783ebcfe38d90e008bf670455864e76c4e2e12e015eb0f1eb3437af2 1.38GB / 1.38GB 101.7s
=> => extracting sha256:bc572704fd2210f6f004e9ee692d23852aa450426fa0d53f79a6f8c48cc97933 0.2s
=> => sha256:f7b8d7bf559f962887e08ebc8e5c9be1de826ca214571ed239264ee899e2e308 63.66kB / 63.66kB 8.5s
=> => sha256:e62d0dcce85d430727f1c63d5b87a74c51e31c1bc5c02333a037370ef0bf60e2 1.68kB / 1.68kB 9.3s
=> => extracting sha256:82ca2dd0fe9d6f9d2863bba4353929b5758edef368903821e269f324f10b346d 0.8s
=> => sha256:dd4b12c0cbdb99ca7bc910ab016e62af878770d6e27d7a269b935446ee88409e 1.52kB / 1.52kB 9.8s
=> => extracting sha256:335006729f702b972677393d68888e7296f44146c9cbbeb2373dfba63c98702c 0.0s
=> => extracting sha256:1b9f8e302abf3e7cff2990322faee0d675fe3bf2577c9ececa08d197692e1bd9 0.0s
=> => extracting sha256:120deaf0783ebcfe38d90e008bf670455864e76c4e2e12e015eb0f1eb3437af2 14.6s
=> => extracting sha256:f7b8d7bf559f962887e08ebc8e5c9be1de826ca214571ed239264ee899e2e308 0.0s
=> => extracting sha256:e62d0dcce85d430727f1c63d5b87a74c51e31c1bc5c02333a037370ef0bf60e2 0.0s
=> => extracting sha256:dd4b12c0cbdb99ca7bc910ab016e62af878770d6e27d7a269b935446ee88409e 0.0s
=> [builder 1/7] FROM docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04@sha256:9ac394082aed016f825d89739ae691a51ab75f7091154ec44b68bc8c07a6f2e6 143.1s
=> => resolve docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04@sha256:9ac394082aed016f825d89739ae691a51ab75f7091154ec44b68bc8c07a6f2e6 4.5s
=> => sha256:0fc9e991501cc79708686c0c1fa25a2c462eedee79487d62c130b182ed8ee4c6 17.79kB / 17.79kB 0.0s
=> => sha256:82ca2dd0fe9d6f9d2863bba4353929b5758edef368903821e269f324f10b346d 56.23MB / 56.23MB 7.7s
=> => sha256:9ac394082aed016f825d89739ae691a51ab75f7091154ec44b68bc8c07a6f2e6 743B / 743B 0.0s
=> => sha256:1d36277d7f886815b2548cc457e3d510006c0252359e9b28c92ed617f28edd72 2.63kB / 2.63kB 0.0s
=> => sha256:677076032cca0a2362d25cf3660072e738d1b96fe860409a33ce901d695d7ee8 29.53MB / 29.53MB 6.2s
=> => sha256:bc572704fd2210f6f004e9ee692d23852aa450426fa0d53f79a6f8c48cc97933 4.60MB / 4.60MB 5.0s
=> => sha256:335006729f702b972677393d68888e7296f44146c9cbbeb2373dfba63c98702c 188B / 188B 6.9s
=> => sha256:1b9f8e302abf3e7cff2990322faee0d675fe3bf2577c9ececa08d197692e1bd9 6.43kB / 6.43kB 7.3s
=> => extracting sha256:677076032cca0a2362d25cf3660072e738d1b96fe860409a33ce901d695d7ee8 1231.6s
=> => sha256:120deaf0783ebcfe38d90e008bf670455864e76c4e2e12e015eb0f1eb3437af2 1.38GB / 1.38GB 101.6s
=> => extracting sha256:bc572704fd2210f6f004e9ee692d23852aa450426fa0d53f79a6f8c48cc97933 1230.7s
=> => sha256:f7b8d7bf559f962887e08ebc8e5c9be1de826ca214571ed239264ee899e2e308 63.66kB / 63.66kB 8.4s
=> => extracting sha256:82ca2dd0fe9d6f9d2863bba4353929b5758edef368903821e269f324f10b346d 1230.2s
=> => sha256:e62d0dcce85d430727f1c63d5b87a74c51e31c1bc5c02333a037370ef0bf60e2 1.68kB / 1.68kB 9.1s
=> => sha256:dd4b12c0cbdb99ca7bc910ab016e62af878770d6e27d7a269b935446ee88409e 1.52kB / 1.52kB 9.7s
=> => extracting sha256:335006729f702b972677393d68888e7296f44146c9cbbeb2373dfba63c98702c 0.0s
=> => extracting sha256:1b9f8e302abf3e7cff2990322faee0d675fe3bf2577c9ececa08d197692e1bd9 1229.3s
=> => sha256:96670d94e1e8bb9caae27ab3a6e3c2699c44a95fa21fc68b99bee8e38ccbc8e7 1.81GB / 1.81GB 112.5s
=> => sha256:bb10049f791d096d47ec777949b7e046262abf71b18b75317784f0f44d696460 87.66kB / 87.66kB 10.8s
=> => extracting sha256:120deaf0783ebcfe38d90e008bf670455864e76c4e2e12e015eb0f1eb3437af2 1136.4s
=> => extracting sha256:f7b8d7bf559f962887e08ebc8e5c9be1de826ca214571ed239264ee899e2e308 0.0s
=> => extracting sha256:e62d0dcce85d430727f1c63d5b87a74c51e31c1bc5c02333a037370ef0bf60e2 1121.4s
=> => extracting sha256:dd4b12c0cbdb99ca7bc910ab016e62af878770d6e27d7a269b935446ee88409e 1121.3s
=> => extracting sha256:96670d94e1e8bb9caae27ab3a6e3c2699c44a95fa21fc68b99bee8e38ccbc8e7 21.7s
=> => extracting sha256:bb10049f791d096d47ec777949b7e046262abf71b18b75317784f0f44d696460 0.0s
=> [auth] nvidia/cuda:pull token for registry-1.docker.io 0.0s
=> [stage-1 2/24] RUN apt-get update && apt-get install --no-install-recommends -y git python3 python3-pip make g++ && rm -rf /var/lib/apt/lists/* 61.8s
=> [builder 2/7] RUN apt-get update && apt-get install --no-install-recommends -y git vim build-essential python3-dev python3-venv && rm -rf /var/lib/ 37.3s
=> [builder 3/7] RUN git clone https://github.com/oobabooga/GPTQ-for-LLaMa /build 6.0s
=> [stage-1 3/24] RUN --mount=type=cache,target=/root/.cache/pip pip3 install virtualenv 13.0s
=> [builder 4/7] WORKDIR /build 0.1s
=> [builder 5/7] RUN python3 -m venv /build/venv 3.2s
=> [builder 6/7] RUN . /build/venv/bin/activate && pip3 install --upgrade pip setuptools && pip3 install torch torchvision torchaudio && pip3 ins 645.7s
=> [stage-1 4/24] RUN mkdir /app 0.5s
=> [stage-1 5/24] WORKDIR /app 0.0s
=> [stage-1 6/24] RUN test -n "HEAD" && git reset --hard HEAD || echo "Using provided webui source" 0.6s
=> [stage-1 7/24] RUN virtualenv /app/venv 1.1s
=> [stage-1 8/24] RUN . /app/venv/bin/activate && pip3 install --upgrade pip setuptools && pip3 install torch torchvision torchaudio 588.6s
=> [builder 7/7] RUN . /build/venv/bin/activate && python3 setup_cuda.py bdist_wheel -d . 81.1s
=> [stage-1 9/24] COPY --from=builder /build /app/repositories/GPTQ-for-LLaMa 9.0s
=> [stage-1 10/24] RUN . /app/venv/bin/activate && pip3 install /app/repositories/GPTQ-for-LLaMa/*.whl 1.0s
=> [stage-1 11/24] COPY extensions/api/requirements.txt /app/extensions/api/requirements.txt 0.0s
=> [stage-1 12/24] COPY extensions/elevenlabs_tts/requirements.txt /app/extensions/elevenlabs_tts/requirements.txt 0.0s
=> [stage-1 13/24] COPY extensions/google_translate/requirements.txt /app/extensions/google_translate/requirements.txt 0.0s
=> [stage-1 14/24] COPY extensions/silero_tts/requirements.txt /app/extensions/silero_tts/requirements.txt 0.2s
=> [stage-1 15/24] COPY extensions/whisper_stt/requirements.txt /app/extensions/whisper_stt/requirements.txt 0.1s
=> [stage-1 16/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/api && pip3 install -r requirements.txt 10.2s
=> [stage-1 17/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/elevenlabs_tts && pip3 install -r requirements.txt 10.4s
=> [stage-1 18/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/google_translate && pip3 install -r requirements.tx 8.1s
=> [stage-1 19/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/silero_tts && pip3 install -r requirements.txt 29.3s
=> [stage-1 20/24] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/whisper_stt && pip3 install -r requirements.txt 53.6s
=> [stage-1 21/24] COPY requirements.txt /app/requirements.txt 0.0s
=> [stage-1 22/24] RUN . /app/venv/bin/activate && pip3 install -r requirements.txt 158.7s
=> [stage-1 23/24] RUN cp /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so /app/venv/lib/python3.10/site-packages/bitsandbytes/li 0.4s
=> [stage-1 24/24] COPY . /app/ 0.0s
=> exporting to image 28.2s
=> => exporting layers 28.2s
=> => writing image sha256:3b88829d27bbb4ac812b8119ec729bad9cabada0aab1a8d2976960b18b033a0a 0.0s
=> => naming to docker.io/library/text-generation-webui-text-generation-webui 0.0s
[+] Running 2/2
- Network text-generation-webui_default Created 0.0s
- Container text-generation-webui-text-generation-webui-1 Created 0.1s
Attaching to text-generation-webui-text-generation-webui-1
text-generation-webui-text-generation-webui-1 |
text-generation-webui-text-generation-webui-1 | ==========
text-generation-webui-text-generation-webui-1 | == CUDA ==
text-generation-webui-text-generation-webui-1 | ==========
text-generation-webui-text-generation-webui-1 |
text-generation-webui-text-generation-webui-1 | CUDA Version 11.8.0
text-generation-webui-text-generation-webui-1 |
text-generation-webui-text-generation-webui-1 | Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
text-generation-webui-text-generation-webui-1 |
text-generation-webui-text-generation-webui-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
text-generation-webui-text-generation-webui-1 | By pulling and using the container, you accept the terms and conditions of this license:
text-generation-webui-text-generation-webui-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
text-generation-webui-text-generation-webui-1 |
text-generation-webui-text-generation-webui-1 | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
text-generation-webui-text-generation-webui-1 |
text-generation-webui-text-generation-webui-1 |
text-generation-webui-text-generation-webui-1 | ===================================BUG REPORT===================================
text-generation-webui-text-generation-webui-1 | Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
text-generation-webui-text-generation-webui-1 | ================================================================================
text-generation-webui-text-generation-webui-1 | /app/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
text-generation-webui-text-generation-webui-1 | warn(msg)
text-generation-webui-text-generation-webui-1 | /app/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
text-generation-webui-text-generation-webui-1 | warn(msg)
text-generation-webui-text-generation-webui-1 | CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
text-generation-webui-text-generation-webui-1 | /app/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
text-generation-webui-text-generation-webui-1 | warn(msg)
text-generation-webui-text-generation-webui-1 | CUDA SETUP: Highest compute capability among GPUs detected: 6.1
text-generation-webui-text-generation-webui-1 | CUDA SETUP: Detected CUDA version 117
text-generation-webui-text-generation-webui-1 | /app/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
text-generation-webui-text-generation-webui-1 | warn(msg)
text-generation-webui-text-generation-webui-1 | CUDA SETUP: Loading binary /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
text-generation-webui-text-generation-webui-1 | Loading llama-7b-4bit...
text-generation-webui-text-generation-webui-1 | Could not find the quantized model in .pt or .safetensors format, exiting...
text-generation-webui-text-generation-webui-1 exited with code 0
### System Info
```shell
Win11 Pro 21H2
NVIDIA GTX1080
WSL2
Docker desktop v4.16.3
Yes. getting this exact error with docker.
[+] Running 1/1 ✔ Container text-generation-webui-text-generation-webui-1 Recreated 0.0s Attaching to text-generation-webui-text-generation-webui-1 text-generation-webui-text-generation-webui-1 | text-generation-webui-text-generation-webui-1 | ========== text-generation-webui-text-generation-webui-1 | == CUDA == text-generation-webui-text-generation-webui-1 | ========== text-generation-webui-text-generation-webui-1 | text-generation-webui-text-generation-webui-1 | CUDA Version 11.8.0 text-generation-webui-text-generation-webui-1 | text-generation-webui-text-generation-webui-1 | Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. text-generation-webui-text-generation-webui-1 | text-generation-webui-text-generation-webui-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License. text-generation-webui-text-generation-webui-1 | By pulling and using the container, you accept the terms and conditions of this license: text-generation-webui-text-generation-webui-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license text-generation-webui-text-generation-webui-1 | text-generation-webui-text-generation-webui-1 | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. text-generation-webui-text-generation-webui-1 | text-generation-webui-text-generation-webui-1 | Loading gpt-x-alpaca-13b-native-4bit-128g... text-generation-webui-text-generation-webui-1 | Could not find the quantized model in .pt or .safetensors format, exiting... text-generation-webui-text-generation-webui-1 exited with code 0
Did you change CLI_ARGS=
in your .env and rename the folder gpt4-x-alpaca-13b-native-4bit-128g -> gpt-x-alpaca-13b-native-4bit-128g
?
CLI_ARGS=--model gpt-x-alpaca-13b-native-4bit-128g --chat --wbits 4 --listen --groupsize 128 --auto-devices
Hi, I changed the CLI args and did a git pull, but I'm afraid it's still not working. Getting the following error:
=> ERROR [stage-1 11/25] COPY extensions/api/requirements.txt /app/extensions/api/requirements.txt 0.0s
=> ERROR [stage-1 12/25] COPY extensions/elevenlabs_tts/requirements.txt /app/extensions/elevenlabs_tts/requirements.txt 0.0s
=> ERROR [stage-1 13/25] COPY extensions/google_translate/requirements.txt /app/extensions/google_translate/requirements.txt 0.0s
=> ERROR [stage-1 14/25] COPY extensions/silero_tts/requirements.txt /app/extensions/silero_tts/requirements.txt 0.0s
=> ERROR [stage-1 15/25] COPY extensions/whisper_stt/requirements.txt /app/extensions/whisper_stt/requirements.txt 0.0s
=> CACHED [stage-1 16/25] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/api && pip3 install -r requiremen 0.0s
=> CACHED [stage-1 17/25] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/elevenlabs_tts && pip3 install -r 0.0s
=> CACHED [stage-1 18/25] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/google_translate && pip3 install 0.0s
=> CACHED [stage-1 19/25] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/silero_tts && pip3 install -r req 0.0s
=> CACHED [stage-1 20/25] RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/whisper_stt && pip3 install -r re 0.0s
=> CACHED [stage-1 21/25] RUN ls /app 0.0s
=> ERROR [stage-1 22/25] COPY requirements.txt /app/requirements.txt 0.0s
------
> [stage-1 11/25] COPY extensions/api/requirements.txt /app/extensions/api/requirements.txt:
------
------
> [stage-1 12/25] COPY extensions/elevenlabs_tts/requirements.txt /app/extensions/elevenlabs_tts/requirements.txt:
------
------
> [stage-1 13/25] COPY extensions/google_translate/requirements.txt /app/extensions/google_translate/requirements.txt:
------
------
> [stage-1 14/25] COPY extensions/silero_tts/requirements.txt /app/extensions/silero_tts/requirements.txt:
------
------
> [stage-1 15/25] COPY extensions/whisper_stt/requirements.txt /app/extensions/whisper_stt/requirements.txt:
------
------
> [stage-1 22/25] COPY requirements.txt /app/requirements.txt:
------
failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::zbk22pkvvqgu401wvr4vrgn2a: "/requirements.txt": not found
@ALL SOLVED:
The source code actually add cmd arguments to model name to figure out the model:
So you will do something like
Where the actual model name on the disk is vicuna-13B-1.1-GPTQ-4bit-128g.safetensors
I had to rename models/facebook_opt-1.3b/pytorch_model.bin
to models/facebook_opt-1.3b/pytorch_model.bin.pt
Update: the model loads but generates garbage
Update2: the models dont seem to work. only ones with .safetensors format eg : https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML/tree/main
@ALL SOLVED:
The source code actually add cmd arguments to model name to figure out the model:
So you will do something like
Where the actual model name on the disk is
vicuna-13B-1.1-GPTQ-4bit-128g.safetensors
Hi,how it work with .bin file,if i want to use the model only have bin type. Such as PygmalionAI/pygmalion-7b,what should i do.
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.