text-generation-webui
text-generation-webui copied to clipboard
stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors fails to load
Describe the bug
This particular model indicates it should be compatible on the card, with the same version of GPTQ (the branch here instead of the main triton one - I have tried using that too but it doesn't seem to work either, producing different errors) I am able to use for gpt-x-alpaca, but it doesn't seem to work, giving errors with prelayer 35. If I set prelayer to 0 it loads, but won't generate anything.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
As far as I know, this failure to be able to use stable-vicuna should happen to anyone running on Linux at least. I haven't attempted this in Windows yet.
Screenshot
No response
Logs
The error I get if I set prelayer to 35, where the model does not load
Traceback (most recent call last):
File “/mnt/16TB_BTRFS/Projects/Matthew/AI/text-generation-webui/server.py”, line 102, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “/mnt/16TB_BTRFS/Projects/Matthew/AI/text-generation-webui/modules/models.py”, line 158, in load_model
model = load_quantized(model_name)
File “/mnt/16TB_BTRFS/Projects/Matthew/AI/text-generation-webui/modules/GPTQ_loader.py”, line 173, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, shared.args.pre_layer)
File “/mnt/16TB_BTRFS/Projects/Matthew/AI/text-generation-webui/repositories/GPTQ-for-LLaMa/llama_inference_offload.py”, line 226, in load_quant
model.load_state_dict(safe_load(checkpoint))
File “/home/korodarn/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1671, in load_state_dict
raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
Missing key(s) in state_dict: “model.layers.0.self_attn.k_proj.bias”, “model.layers.0.self_attn.o_proj.bias”, “model.layers.0.self_attn.q_proj.bias”, “model.layers.0.self_attn.v_proj.bias”, “model.layers.0.mlp.down_proj.bias”, “model.layers.0.mlp.gate_proj.bias”, “model.layers.0.mlp.up_proj.bias”, “model.layers.1.self_attn.k_proj.bias”, “model.layers.1.self_attn.o_proj.bias”, “model.layers.1.self_attn.q_proj.bias”, “model.layers.1.self_attn.v_proj.bias”, “model.layers.1.mlp.down_proj.bias”, “model.layers.1.mlp.gate_proj.bias”, “model.layers.1.mlp.up_proj.bias”, “model.layers.2.self_attn.k_proj.bias”, “model.layers.2.self_attn.o_proj.bias”, “model.layers.2.self_attn.q_proj.bias”, “model.layers.2.self_attn.v_proj.bias”, “model.layers.2.mlp.down_proj.bias”, “model.layers.2.mlp.gate_proj.bias”, “model.layers.2.mlp.up_proj.bias”, “model.layers.3.self_attn.k_proj.bias”, “model.layers.3.self_attn.o_proj.bias”, “model.layers.3.self_attn.q_proj.bias”, “model.layers.3.self_attn.v_proj.bias”, “model.layers.3.mlp.down_proj.bias”, “model.layers.3.mlp.gate_proj.bias”, “model.layers.3.mlp.up_proj.bias”, “model.layers.4.self_attn.k_proj.bias”, “model.layers.4.self_attn.o_proj.bias”, “model.layers.4.self_attn.q_proj.bias”, “model.layers.4.self_attn.v_proj.bias”, “model.layers.4.mlp.down_proj.bias”, “model.layers.4.mlp.gate_proj.bias”, “model.layers.4.mlp.up_proj.bias”, “model.layers.5.self_attn.k_proj.bias”, “model.layers.5.self_attn.o_proj.bias”, “model.layers.5.self_attn.q_proj.bias”, “model.layers.5.self_attn.v_proj.bias”, “model.layers.5.mlp.down_proj.bias”, “model.layers.5.mlp.gate_proj.bias”, “model.layers.5.mlp.up_proj.bias”, “model.layers.6.self_attn.k_proj.bias”, “model.layers.6.self_attn.o_proj.bias”, “model.layers.6.self_attn.q_proj.bias”, “model.layers.6.self_attn.v_proj.bias”, “model.layers.6.mlp.down_proj.bias”, “model.layers.6.mlp.gate_proj.bias”, “model.layers.6.mlp.up_proj.bias”, “model.layers.7.self_attn.k_proj.bias”, “model.layers.7.self_attn.o_proj.bias”, “model.layers.7.self_attn.q_proj.bias”, “model.layers.7.self_attn.v_proj.bias”, “model.layers.7.mlp.down_proj.bias”, “model.layers.7.mlp.gate_proj.bias”, “model.layers.7.mlp.up_proj.bias”, “model.layers.8.self_attn.k_proj.bias”, “model.layers.8.self_attn.o_proj.bias”, “model.layers.8.self_attn.q_proj.bias”, “model.layers.8.self_attn.v_proj.bias”, “model.layers.8.mlp.down_proj.bias”, “model.layers.8.mlp.gate_proj.bias”, “model.layers.8.mlp.up_proj.bias”, “model.layers.9.self_attn.k_proj.bias”, “model.layers.9.self_attn.o_proj.bias”, “model.layers.9.self_attn.q_proj.bias”, “model.layers.9.self_attn.v_proj.bias”, “model.layers.9.mlp.down_proj.bias”, “model.layers.9.mlp.gate_proj.bias”, “model.layers.9.mlp.up_proj.bias”, “model.layers.10.self_attn.k_proj.bias”, “model.layers.10.self_attn.o_proj.bias”, “model.layers.10.self_attn.q_proj.bias”, “model.layers.10.self_attn.v_proj.bias”, “model.layers.10.mlp.down_proj.bias”, “model.layers.10.mlp.gate_proj.bias”, “model.layers.10.mlp.up_proj.bias”, “model.layers.11.self_attn.k_proj.bias”, “model.layers.11.self_attn.o_proj.bias”, “model.layers.11.self_attn.q_proj.bias”, “model.layers.11.self_attn.v_proj.bias”, “model.layers.11.mlp.down_proj.bias”, “model.layers.11.mlp.gate_proj.bias”, “model.layers.11.mlp.up_proj.bias”, “model.layers.12.self_attn.k_proj.bias”, “model.layers.12.self_attn.o_proj.bias”, “model.layers.12.self_attn.q_proj.bias”, “model.layers.12.self_attn.v_proj.bias”, “model.layers.12.mlp.down_proj.bias”, “model.layers.12.mlp.gate_proj.bias”, “model.layers.12.mlp.up_proj.bias”, “model.layers.13.self_attn.k_proj.bias”, “model.layers.13.self_attn.o_proj.bias”, “model.layers.13.self_attn.q_proj.bias”, “model.layers.13.self_attn.v_proj.bias”, “model.layers.13.mlp.down_proj.bias”, “model.layers.13.mlp.gate_proj.bias”, “model.layers.13.mlp.up_proj.bias”, “model.layers.14.self_attn.k_proj.bias”, “model.layers.14.self_attn.o_proj.bias”, “model.layers.14.self_attn.q_proj.bias”, “model.layers.14.self_attn.v_proj.bias”, “model.layers.14.mlp.down_proj.bias”, “model.layers.14.mlp.gate_proj.bias”, “model.layers.14.mlp.up_proj.bias”, “model.layers.15.self_attn.k_proj.bias”, “model.layers.15.self_attn.o_proj.bias”, “model.layers.15.self_attn.q_proj.bias”, “model.layers.15.self_attn.v_proj.bias”, “model.layers.15.mlp.down_proj.bias”, “model.layers.15.mlp.gate_proj.bias”, “model.layers.15.mlp.up_proj.bias”, “model.layers.16.self_attn.k_proj.bias”, “model.layers.16.self_attn.o_proj.bias”, “model.layers.16.self_attn.q_proj.bias”, “model.layers.16.self_attn.v_proj.bias”, “model.layers.16.mlp.down_proj.bias”, “model.layers.16.mlp.gate_proj.bias”, “model.layers.16.mlp.up_proj.bias”, “model.layers.17.self_attn.k_proj.bias”, “model.layers.17.self_attn.o_proj.bias”, “model.layers.17.self_attn.q_proj.bias”, “model.layers.17.self_attn.v_proj.bias”, “model.layers.17.mlp.down_proj.bias”, “model.layers.17.mlp.gate_proj.bias”, “model.layers.17.mlp.up_proj.bias”, “model.layers.18.self_attn.k_proj.bias”, “model.layers.18.self_attn.o_proj.bias”, “model.layers.18.self_attn.q_proj.bias”, “model.layers.18.self_attn.v_proj.bias”, “model.layers.18.mlp.down_proj.bias”, “model.layers.18.mlp.gate_proj.bias”, “model.layers.18.mlp.up_proj.bias”, “model.layers.19.self_attn.k_proj.bias”, “model.layers.19.self_attn.o_proj.bias”, “model.layers.19.self_attn.q_proj.bias”, “model.layers.19.self_attn.v_proj.bias”, “model.layers.19.mlp.down_proj.bias”, “model.layers.19.mlp.gate_proj.bias”, “model.layers.19.mlp.up_proj.bias”, “model.layers.20.self_attn.k_proj.bias”, “model.layers.20.self_attn.o_proj.bias”, “model.layers.20.self_attn.q_proj.bias”, “model.layers.20.self_attn.v_proj.bias”, “model.layers.20.mlp.down_proj.bias”, “model.layers.20.mlp.gate_proj.bias”, “model.layers.20.mlp.up_proj.bias”, “model.layers.21.self_attn.k_proj.bias”, “model.layers.21.self_attn.o_proj.bias”, “model.layers.21.self_attn.q_proj.bias”, “model.layers.21.self_attn.v_proj.bias”, “model.layers.21.mlp.down_proj.bias”, “model.layers.21.mlp.gate_proj.bias”, “model.layers.21.mlp.up_proj.bias”, “model.layers.22.self_attn.k_proj.bias”, “model.layers.22.self_attn.o_proj.bias”, “model.layers.22.self_attn.q_proj.bias”, “model.layers.22.self_attn.v_proj.bias”, “model.layers.22.mlp.down_proj.bias”, “model.layers.22.mlp.gate_proj.bias”, “model.layers.22.mlp.up_proj.bias”, “model.layers.23.self_attn.k_proj.bias”, “model.layers.23.self_attn.o_proj.bias”, “model.layers.23.self_attn.q_proj.bias”, “model.layers.23.self_attn.v_proj.bias”, “model.layers.23.mlp.down_proj.bias”, “model.layers.23.mlp.gate_proj.bias”, “model.layers.23.mlp.up_proj.bias”, “model.layers.24.self_attn.k_proj.bias”, “model.layers.24.self_attn.o_proj.bias”, “model.layers.24.self_attn.q_proj.bias”, “model.layers.24.self_attn.v_proj.bias”, “model.layers.24.mlp.down_proj.bias”, “model.layers.24.mlp.gate_proj.bias”, “model.layers.24.mlp.up_proj.bias”, “model.layers.25.self_attn.k_proj.bias”, “model.layers.25.self_attn.o_proj.bias”, “model.layers.25.self_attn.q_proj.bias”, “model.layers.25.self_attn.v_proj.bias”, “model.layers.25.mlp.down_proj.bias”, “model.layers.25.mlp.gate_proj.bias”, “model.layers.25.mlp.up_proj.bias”, “model.layers.26.self_attn.k_proj.bias”, “model.layers.26.self_attn.o_proj.bias”, “model.layers.26.self_attn.q_proj.bias”, “model.layers.26.self_attn.v_proj.bias”, “model.layers.26.mlp.down_proj.bias”, “model.layers.26.mlp.gate_proj.bias”, “model.layers.26.mlp.up_proj.bias”, “model.layers.27.self_attn.k_proj.bias”, “model.layers.27.self_attn.o_proj.bias”, “model.layers.27.self_attn.q_proj.bias”, “model.layers.27.self_attn.v_proj.bias”, “model.layers.27.mlp.down_proj.bias”, “model.layers.27.mlp.gate_proj.bias”, “model.layers.27.mlp.up_proj.bias”, “model.layers.28.self_attn.k_proj.bias”, “model.layers.28.self_attn.o_proj.bias”, “model.layers.28.self_attn.q_proj.bias”, “model.layers.28.self_attn.v_proj.bias”, “model.layers.28.mlp.down_proj.bias”, “model.layers.28.mlp.gate_proj.bias”, “model.layers.28.mlp.up_proj.bias”, “model.layers.29.self_attn.k_proj.bias”, “model.layers.29.self_attn.o_proj.bias”, “model.layers.29.self_attn.q_proj.bias”, “model.layers.29.self_attn.v_proj.bias”, “model.layers.29.mlp.down_proj.bias”, “model.layers.29.mlp.gate_proj.bias”, “model.layers.29.mlp.up_proj.bias”, “model.layers.30.self_attn.k_proj.bias”, “model.layers.30.self_attn.o_proj.bias”, “model.layers.30.self_attn.q_proj.bias”, “model.layers.30.self_attn.v_proj.bias”, “model.layers.30.mlp.down_proj.bias”, “model.layers.30.mlp.gate_proj.bias”, “model.layers.30.mlp.up_proj.bias”, “model.layers.31.self_attn.k_proj.bias”, “model.layers.31.self_attn.o_proj.bias”, “model.layers.31.self_attn.q_proj.bias”, “model.layers.31.self_attn.v_proj.bias”, “model.layers.31.mlp.down_proj.bias”, “model.layers.31.mlp.gate_proj.bias”, “model.layers.31.mlp.up_proj.bias”, “model.layers.32.self_attn.k_proj.bias”, “model.layers.32.self_attn.o_proj.bias”, “model.layers.32.self_attn.q_proj.bias”, “model.layers.32.self_attn.v_proj.bias”, “model.layers.32.mlp.down_proj.bias”, “model.layers.32.mlp.gate_proj.bias”, “model.layers.32.mlp.up_proj.bias”, “model.layers.33.self_attn.k_proj.bias”, “model.layers.33.self_attn.o_proj.bias”, “model.layers.33.self_attn.q_proj.bias”, “model.layers.33.self_attn.v_proj.bias”, “model.layers.33.mlp.down_proj.bias”, “model.layers.33.mlp.gate_proj.bias”, “model.layers.33.mlp.up_proj.bias”, “model.layers.34.self_attn.k_proj.bias”, “model.layers.34.self_attn.o_proj.bias”, “model.layers.34.self_attn.q_proj.bias”, “model.layers.34.self_attn.v_proj.bias”, “model.layers.34.mlp.down_proj.bias”, “model.layers.34.mlp.gate_proj.bias”, “model.layers.34.mlp.up_proj.bias”, “model.layers.35.self_attn.k_proj.bias”, “model.layers.35.self_attn.o_proj.bias”, “model.layers.35.self_attn.q_proj.bias”, “model.layers.35.self_attn.v_proj.bias”, “model.layers.35.mlp.down_proj.bias”, “model.layers.35.mlp.gate_proj.bias”, “model.layers.35.mlp.up_proj.bias”, “model.layers.36.self_attn.k_proj.bias”, “model.layers.36.self_attn.o_proj.bias”, “model.layers.36.self_attn.q_proj.bias”, “model.layers.36.self_attn.v_proj.bias”, “model.layers.36.mlp.down_proj.bias”, “model.layers.36.mlp.gate_proj.bias”, “model.layers.36.mlp.up_proj.bias”, “model.layers.37.self_attn.k_proj.bias”, “model.layers.37.self_attn.o_proj.bias”, “model.layers.37.self_attn.q_proj.bias”, “model.layers.37.self_attn.v_proj.bias”, “model.layers.37.mlp.down_proj.bias”, “model.layers.37.mlp.gate_proj.bias”, “model.layers.37.mlp.up_proj.bias”, “model.layers.38.self_attn.k_proj.bias”, “model.layers.38.self_attn.o_proj.bias”, “model.layers.38.self_attn.q_proj.bias”, “model.layers.38.self_attn.v_proj.bias”, “model.layers.38.mlp.down_proj.bias”, “model.layers.38.mlp.gate_proj.bias”, “model.layers.38.mlp.up_proj.bias”, “model.layers.39.self_attn.k_proj.bias”, “model.layers.39.self_attn.o_proj.bias”, “model.layers.39.self_attn.q_proj.bias”, “model.layers.39.self_attn.v_proj.bias”, “model.layers.39.mlp.down_proj.bias”, “model.layers.39.mlp.gate_proj.bias”, “model.layers.39.mlp.up_proj.bias”.
Unexpected key(s) in state_dict: “model.layers.0.self_attn.k_proj.g_idx”, “model.layers.0.self_attn.o_proj.g_idx”, “model.layers.0.self_attn.q_proj.g_idx”, “model.layers.0.self_attn.v_proj.g_idx”, “model.layers.0.mlp.down_proj.g_idx”, “model.layers.0.mlp.gate_proj.g_idx”, “model.layers.0.mlp.up_proj.g_idx”, “model.layers.1.self_attn.k_proj.g_idx”, “model.layers.1.self_attn.o_proj.g_idx”, “model.layers.1.self_attn.q_proj.g_idx”, “model.layers.1.self_attn.v_proj.g_idx”, “model.layers.1.mlp.down_proj.g_idx”, “model.layers.1.mlp.gate_proj.g_idx”, “model.layers.1.mlp.up_proj.g_idx”, “model.layers.2.self_attn.k_proj.g_idx”, “model.layers.2.self_attn.o_proj.g_idx”, “model.layers.2.self_attn.q_proj.g_idx”, “model.layers.2.self_attn.v_proj.g_idx”, “model.layers.2.mlp.down_proj.g_idx”, “model.layers.2.mlp.gate_proj.g_idx”, “model.layers.2.mlp.up_proj.g_idx”, “model.layers.3.self_attn.k_proj.g_idx”, “model.layers.3.self_attn.o_proj.g_idx”, “model.layers.3.self_attn.q_proj.g_idx”, “model.layers.3.self_attn.v_proj.g_idx”, “model.layers.3.mlp.down_proj.g_idx”, “model.layers.3.mlp.gate_proj.g_idx”, “model.layers.3.mlp.up_proj.g_idx”, “model.layers.4.self_attn.k_proj.g_idx”, “model.layers.4.self_attn.o_proj.g_idx”, “model.layers.4.self_attn.q_proj.g_idx”, “model.layers.4.self_attn.v_proj.g_idx”, “model.layers.4.mlp.down_proj.g_idx”, “model.layers.4.mlp.gate_proj.g_idx”, “model.layers.4.mlp.up_proj.g_idx”, “model.layers.5.self_attn.k_proj.g_idx”, “model.layers.5.self_attn.o_proj.g_idx”, “model.layers.5.self_attn.q_proj.g_idx”, “model.layers.5.self_attn.v_proj.g_idx”, “model.layers.5.mlp.down_proj.g_idx”, “model.layers.5.mlp.gate_proj.g_idx”, “model.layers.5.mlp.up_proj.g_idx”, “model.layers.6.self_attn.k_proj.g_idx”, “model.layers.6.self_attn.o_proj.g_idx”, “model.layers.6.self_attn.q_proj.g_idx”, “model.layers.6.self_attn.v_proj.g_idx”, “model.layers.6.mlp.down_proj.g_idx”, “model.layers.6.mlp.gate_proj.g_idx”, “model.layers.6.mlp.up_proj.g_idx”, “model.layers.7.self_attn.k_proj.g_idx”, “model.layers.7.self_attn.o_proj.g_idx”, “model.layers.7.self_attn.q_proj.g_idx”, “model.layers.7.self_attn.v_proj.g_idx”, “model.layers.7.mlp.down_proj.g_idx”, “model.layers.7.mlp.gate_proj.g_idx”, “model.layers.7.mlp.up_proj.g_idx”, “model.layers.8.self_attn.k_proj.g_idx”, “model.layers.8.self_attn.o_proj.g_idx”, “model.layers.8.self_attn.q_proj.g_idx”, “model.layers.8.self_attn.v_proj.g_idx”, “model.layers.8.mlp.down_proj.g_idx”, “model.layers.8.mlp.gate_proj.g_idx”, “model.layers.8.mlp.up_proj.g_idx”, “model.layers.9.self_attn.k_proj.g_idx”, “model.layers.9.self_attn.o_proj.g_idx”, “model.layers.9.self_attn.q_proj.g_idx”, “model.layers.9.self_attn.v_proj.g_idx”, “model.layers.9.mlp.down_proj.g_idx”, “model.layers.9.mlp.gate_proj.g_idx”, “model.layers.9.mlp.up_proj.g_idx”, “model.layers.10.self_attn.k_proj.g_idx”, “model.layers.10.self_attn.o_proj.g_idx”, “model.layers.10.self_attn.q_proj.g_idx”, “model.layers.10.self_attn.v_proj.g_idx”, “model.layers.10.mlp.down_proj.g_idx”, “model.layers.10.mlp.gate_proj.g_idx”, “model.layers.10.mlp.up_proj.g_idx”, “model.layers.11.self_attn.k_proj.g_idx”, “model.layers.11.self_attn.o_proj.g_idx”, “model.layers.11.self_attn.q_proj.g_idx”, “model.layers.11.self_attn.v_proj.g_idx”, “model.layers.11.mlp.down_proj.g_idx”, “model.layers.11.mlp.gate_proj.g_idx”, “model.layers.11.mlp.up_proj.g_idx”, “model.layers.12.self_attn.k_proj.g_idx”, “model.layers.12.self_attn.o_proj.g_idx”, “model.layers.12.self_attn.q_proj.g_idx”, “model.layers.12.self_attn.v_proj.g_idx”, “model.layers.12.mlp.down_proj.g_idx”, “model.layers.12.mlp.gate_proj.g_idx”, “model.layers.12.mlp.up_proj.g_idx”, “model.layers.13.self_attn.k_proj.g_idx”, “model.layers.13.self_attn.o_proj.g_idx”, “model.layers.13.self_attn.q_proj.g_idx”, “model.layers.13.self_attn.v_proj.g_idx”, “model.layers.13.mlp.down_proj.g_idx”, “model.layers.13.mlp.gate_proj.g_idx”, “model.layers.13.mlp.up_proj.g_idx”, “model.layers.14.self_attn.k_proj.g_idx”, “model.layers.14.self_attn.o_proj.g_idx”, “model.layers.14.self_attn.q_proj.g_idx”, “model.layers.14.self_attn.v_proj.g_idx”, “model.layers.14.mlp.down_proj.g_idx”, “model.layers.14.mlp.gate_proj.g_idx”, “model.layers.14.mlp.up_proj.g_idx”, “model.layers.15.self_attn.k_proj.g_idx”, “model.layers.15.self_attn.o_proj.g_idx”, “model.layers.15.self_attn.q_proj.g_idx”, “model.layers.15.self_attn.v_proj.g_idx”, “model.layers.15.mlp.down_proj.g_idx”, “model.layers.15.mlp.gate_proj.g_idx”, “model.layers.15.mlp.up_proj.g_idx”, “model.layers.16.self_attn.k_proj.g_idx”, “model.layers.16.self_attn.o_proj.g_idx”, “model.layers.16.self_attn.q_proj.g_idx”, “model.layers.16.self_attn.v_proj.g_idx”, “model.layers.16.mlp.down_proj.g_idx”, “model.layers.16.mlp.gate_proj.g_idx”, “model.layers.16.mlp.up_proj.g_idx”, “model.layers.17.self_attn.k_proj.g_idx”, “model.layers.17.self_attn.o_proj.g_idx”, “model.layers.17.self_attn.q_proj.g_idx”, “model.layers.17.self_attn.v_proj.g_idx”, “model.layers.17.mlp.down_proj.g_idx”, “model.layers.17.mlp.gate_proj.g_idx”, “model.layers.17.mlp.up_proj.g_idx”, “model.layers.18.self_attn.k_proj.g_idx”, “model.layers.18.self_attn.o_proj.g_idx”, “model.layers.18.self_attn.q_proj.g_idx”, “model.layers.18.self_attn.v_proj.g_idx”, “model.layers.18.mlp.down_proj.g_idx”, “model.layers.18.mlp.gate_proj.g_idx”, “model.layers.18.mlp.up_proj.g_idx”, “model.layers.19.self_attn.k_proj.g_idx”, “model.layers.19.self_attn.o_proj.g_idx”, “model.layers.19.self_attn.q_proj.g_idx”, “model.layers.19.self_attn.v_proj.g_idx”, “model.layers.19.mlp.down_proj.g_idx”, “model.layers.19.mlp.gate_proj.g_idx”, “model.layers.19.mlp.up_proj.g_idx”, “model.layers.20.self_attn.k_proj.g_idx”, “model.layers.20.self_attn.o_proj.g_idx”, “model.layers.20.self_attn.q_proj.g_idx”, “model.layers.20.self_attn.v_proj.g_idx”, “model.layers.20.mlp.down_proj.g_idx”, “model.layers.20.mlp.gate_proj.g_idx”, “model.layers.20.mlp.up_proj.g_idx”, “model.layers.21.self_attn.k_proj.g_idx”, “model.layers.21.self_attn.o_proj.g_idx”, “model.layers.21.self_attn.q_proj.g_idx”, “model.layers.21.self_attn.v_proj.g_idx”, “model.layers.21.mlp.down_proj.g_idx”, “model.layers.21.mlp.gate_proj.g_idx”, “model.layers.21.mlp.up_proj.g_idx”, “model.layers.22.self_attn.k_proj.g_idx”, “model.layers.22.self_attn.o_proj.g_idx”, “model.layers.22.self_attn.q_proj.g_idx”, “model.layers.22.self_attn.v_proj.g_idx”, “model.layers.22.mlp.down_proj.g_idx”, “model.layers.22.mlp.gate_proj.g_idx”, “model.layers.22.mlp.up_proj.g_idx”, “model.layers.23.self_attn.k_proj.g_idx”, “model.layers.23.self_attn.o_proj.g_idx”, “model.layers.23.self_attn.q_proj.g_idx”, “model.layers.23.self_attn.v_proj.g_idx”, “model.layers.23.mlp.down_proj.g_idx”, “model.layers.23.mlp.gate_proj.g_idx”, “model.layers.23.mlp.up_proj.g_idx”, “model.layers.24.self_attn.k_proj.g_idx”, “model.layers.24.self_attn.o_proj.g_idx”, “model.layers.24.self_attn.q_proj.g_idx”, “model.layers.24.self_attn.v_proj.g_idx”, “model.layers.24.mlp.down_proj.g_idx”, “model.layers.24.mlp.gate_proj.g_idx”, “model.layers.24.mlp.up_proj.g_idx”, “model.layers.25.self_attn.k_proj.g_idx”, “model.layers.25.self_attn.o_proj.g_idx”, “model.layers.25.self_attn.q_proj.g_idx”, “model.layers.25.self_attn.v_proj.g_idx”, “model.layers.25.mlp.down_proj.g_idx”, “model.layers.25.mlp.gate_proj.g_idx”, “model.layers.25.mlp.up_proj.g_idx”, “model.layers.26.self_attn.k_proj.g_idx”, “model.layers.26.self_attn.o_proj.g_idx”, “model.layers.26.self_attn.q_proj.g_idx”, “model.layers.26.self_attn.v_proj.g_idx”, “model.layers.26.mlp.down_proj.g_idx”, “model.layers.26.mlp.gate_proj.g_idx”, “model.layers.26.mlp.up_proj.g_idx”, “model.layers.27.self_attn.k_proj.g_idx”, “model.layers.27.self_attn.o_proj.g_idx”, “model.layers.27.self_attn.q_proj.g_idx”, “model.layers.27.self_attn.v_proj.g_idx”, “model.layers.27.mlp.down_proj.g_idx”, “model.layers.27.mlp.gate_proj.g_idx”, “model.layers.27.mlp.up_proj.g_idx”, “model.layers.28.self_attn.k_proj.g_idx”, “model.layers.28.self_attn.o_proj.g_idx”, “model.layers.28.self_attn.q_proj.g_idx”, “model.layers.28.self_attn.v_proj.g_idx”, “model.layers.28.mlp.down_proj.g_idx”, “model.layers.28.mlp.gate_proj.g_idx”, “model.layers.28.mlp.up_proj.g_idx”, “model.layers.29.self_attn.k_proj.g_idx”, “model.layers.29.self_attn.o_proj.g_idx”, “model.layers.29.self_attn.q_proj.g_idx”, “model.layers.29.self_attn.v_proj.g_idx”, “model.layers.29.mlp.down_proj.g_idx”, “model.layers.29.mlp.gate_proj.g_idx”, “model.layers.29.mlp.up_proj.g_idx”, “model.layers.30.self_attn.k_proj.g_idx”, “model.layers.30.self_attn.o_proj.g_idx”, “model.layers.30.self_attn.q_proj.g_idx”, “model.layers.30.self_attn.v_proj.g_idx”, “model.layers.30.mlp.down_proj.g_idx”, “model.layers.30.mlp.gate_proj.g_idx”, “model.layers.30.mlp.up_proj.g_idx”, “model.layers.31.self_attn.k_proj.g_idx”, “model.layers.31.self_attn.o_proj.g_idx”, “model.layers.31.self_attn.q_proj.g_idx”, “model.layers.31.self_attn.v_proj.g_idx”, “model.layers.31.mlp.down_proj.g_idx”, “model.layers.31.mlp.gate_proj.g_idx”, “model.layers.31.mlp.up_proj.g_idx”, “model.layers.32.self_attn.k_proj.g_idx”, “model.layers.32.self_attn.o_proj.g_idx”, “model.layers.32.self_attn.q_proj.g_idx”, “model.layers.32.self_attn.v_proj.g_idx”, “model.layers.32.mlp.down_proj.g_idx”, “model.layers.32.mlp.gate_proj.g_idx”, “model.layers.32.mlp.up_proj.g_idx”, “model.layers.33.self_attn.k_proj.g_idx”, “model.layers.33.self_attn.o_proj.g_idx”, “model.layers.33.self_attn.q_proj.g_idx”, “model.layers.33.self_attn.v_proj.g_idx”, “model.layers.33.mlp.down_proj.g_idx”, “model.layers.33.mlp.gate_proj.g_idx”, “model.layers.33.mlp.up_proj.g_idx”, “model.layers.34.self_attn.k_proj.g_idx”, “model.layers.34.self_attn.o_proj.g_idx”, “model.layers.34.self_attn.q_proj.g_idx”, “model.layers.34.self_attn.v_proj.g_idx”, “model.layers.34.mlp.down_proj.g_idx”, “model.layers.34.mlp.gate_proj.g_idx”, “model.layers.34.mlp.up_proj.g_idx”, “model.layers.35.self_attn.k_proj.g_idx”, “model.layers.35.self_attn.o_proj.g_idx”, “model.layers.35.self_attn.q_proj.g_idx”, “model.layers.35.self_attn.v_proj.g_idx”, “model.layers.35.mlp.down_proj.g_idx”, “model.layers.35.mlp.gate_proj.g_idx”, “model.layers.35.mlp.up_proj.g_idx”, “model.layers.36.self_attn.k_proj.g_idx”, “model.layers.36.self_attn.o_proj.g_idx”, “model.layers.36.self_attn.q_proj.g_idx”, “model.layers.36.self_attn.v_proj.g_idx”, “model.layers.36.mlp.down_proj.g_idx”, “model.layers.36.mlp.gate_proj.g_idx”, “model.layers.36.mlp.up_proj.g_idx”, “model.layers.37.self_attn.k_proj.g_idx”, “model.layers.37.self_attn.o_proj.g_idx”, “model.layers.37.self_attn.q_proj.g_idx”, “model.layers.37.self_attn.v_proj.g_idx”, “model.layers.37.mlp.down_proj.g_idx”, “model.layers.37.mlp.gate_proj.g_idx”, “model.layers.37.mlp.up_proj.g_idx”, “model.layers.38.self_attn.k_proj.g_idx”, “model.layers.38.self_attn.o_proj.g_idx”, “model.layers.38.self_attn.q_proj.g_idx”, “model.layers.38.self_attn.v_proj.g_idx”, “model.layers.38.mlp.down_proj.g_idx”, “model.layers.38.mlp.gate_proj.g_idx”, “model.layers.38.mlp.up_proj.g_idx”, “model.layers.39.self_attn.k_proj.g_idx”, “model.layers.39.self_attn.o_proj.g_idx”, “model.layers.39.self_attn.q_proj.g_idx”, “model.layers.39.self_attn.v_proj.g_idx”, “model.layers.39.mlp.down_proj.g_idx”, “model.layers.39.mlp.gate_proj.g_idx”, “model.layers.39.mlp.up_proj.g_idx”.
The error I get if I set prelayer to 0, where the model does load, but any attempt to get back a result returns an error
Loaded the model in 3.76 seconds.
Traceback (most recent call last):
File "/mnt/16TB_BTRFS/Projects/Matthew/AI/text-generation-webui/modules/callbacks.py", line 66, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/mnt/16TB_BTRFS/Projects/Matthew/AI/text-generation-webui/modules/text_generation.py", line 290, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/korodarn/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/korodarn/.conda/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1563, in generate
return self.sample(
File "/home/korodarn/.conda/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2610, in sample
outputs = self(
File "/home/korodarn/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/korodarn/.conda/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
outputs = self.model(
File "/home/korodarn/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/16TB_BTRFS/Projects/Matthew/AI/text-generation-webui/repositories/GPTQ-for-LLaMa/llama_inference_offload.py", line 135, in forward
if idx <= (self.preload - 1):
File "/home/korodarn/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Offload_LlamaModel' object has no attribute 'preload'
Output generated in 0.26 seconds (0.00 tokens/s, 0 tokens, context 23, seed 1784147284)
### System Info
```shell
./o. korodarn@kenshin
./sssso- ----------------
`:osssssss+- OS: EndeavourOS Linux x86_64
`:+sssssssssso/. Kernel: 6.2.12-arch1-1
`-/ossssssssssssso/. Uptime: 1 hour, 21 mins
`-/+sssssssssssssssso+:` Packages: 1342 (pacman), 10 (flatpak)
`-:/+sssssssssssssssssso+/. Shell: bash 5.1.16
`.://osssssssssssssssssssso++- Resolution: 2560x1440, 3440x1440, 2560x1440
.://+ssssssssssssssssssssssso++: DE: Plasma 5.27.4
.:///ossssssssssssssssssssssssso++: WM: KWin
`:////ssssssssssssssssssssssssssso+++. Theme: [Plasma], Sweet-Ambar-Blue-v40 [GTK3]
`-////+ssssssssssssssssssssssssssso++++- Icons: [Plasma], candy-icons [GTK2/3]
`..-+oosssssssssssssssssssssssso+++++/` Terminal: konsole
./++++++++++++++++++++++++++++++/:. CPU: AMD Ryzen 7 5800X (16) @ 4.000GHz
`:::::::::::::::::::::::::------`` GPU: NVIDIA GeForce RTX 3080
Memory: 8900MiB / 32002MiB
Im getting this error too btw, not sure its limited to you
Same problem here..
What command did you use to run the server? It works fine for me.
Use this one, it works fine (I had the same problem with the one you guys are referring to): https://huggingface.co/4bit/vicuna-v1.1-13b-GPTQ-4bit-128g
Is vicuna-v1.1 and stable-vicuna effectively the same?
@korodarn no, they're different models
I'm also getting error loading on Win10 2060 rtx 8gb gpu when using the --pre_layer 30 flag
Starting the web UI... Gradio HTTP request redirected to localhost :) Loading TheBloke_stable-vicuna-13B-GPTQ... Found the following quantized model: models\TheBloke_stable-vicuna-13B-GPTQ\stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors Loading model ... Traceback (most recent call last): File "C:\oobaBoo\oobabooga-windows\text-generation-webui\server.py", line 914, in shared.model, shared.tokenizer = load_model(shared.model_name) File "C:\oobaBoo\oobabooga-windows\text-generation-webui\modules\models.py", line 158, in load_model model = load_quantized(model_name) File "C:\oobaBoo\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 173, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, shared.args.pre_layer) File "C:\oobaBoo\oobabooga-windows\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py", line 226, in load_quant model.load_state_dict(safe_load(checkpoint)) File "C:\oobaBoo\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: Missing key(s) in state_dict: "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_attn.o_proj.bias", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.v_proj.bias", "model.layers.0.mlp.down_proj.bias", "model.layers.0.mlp.gate_proj.bias", "model.layers.0.mlp.up_proj.bias", "model.layers.1.self_attn.k_proj.bias", "model.layers.1.self_attn.o_proj.bias", "model.layers.1.self_attn.q_proj.bias", "model.layers.1.self_attn.v_proj.bias", "model.layers.1.mlp.down_proj.bias", "model.layers.1.mlp.gate_proj.bias", "model.layers.1.mlp.up_proj.bias", "model.layers.2.self_attn.k_proj.bias", "model.layers.2.self_attn.o_proj.bias", "model.layers.2.self_attn.q_proj.bias", "model.layers.2.self_attn.v_proj.bias", "model.layers.2.mlp.down_proj.bias", "model.layers.2.mlp.gate_proj.bias", "model.layers.2.mlp.up_proj.bias", "model.layers.3.self_attn.k_proj.bias", "model.layers.3.self_attn.o_proj.bias", "model.layers.3.self_attn.q_proj.bias", "model.layers.3.self_attn.v_proj.bias", "model.layers.3.mlp.down_proj.bias", "model.layers.3.mlp.gate_proj.bias", "model.layers.3.mlp.up_proj.bias", "model.layers.4.self_attn.k_proj.bias", "model.layers.4.self_attn.o_proj.bias", "model.layers.4.self_attn.q_proj.bias", "model.layers.4.self_attn.v_proj.bias", "model.layers.4.mlp.down_proj.bias", "model.layers.4.mlp.gate_proj.bias", "model.layers.4.mlp.up_proj.bias", "model.layers.5.self_attn.k_proj.bias", "model.layers.5.self_attn.o_proj.bias", "model.layers.5.self_attn.q_proj.bias", "model.layers.5.self_attn.v_proj.bias", "model.layers.5.mlp.down_proj.bias", "model.layers.5.mlp.gate_proj.bias", "model.layers.5.mlp.up_proj.bias", "model.layers.6.self_attn.k_proj.bias", "model.layers.6.self_attn.o_proj.bias", "model.layers.6.self_attn.q_proj.bias", "model.layers.6.self_attn.v_proj.bias", "model.layers.6.mlp.down_proj.bias", "model.layers.6.mlp.gate_proj.bias", "model.layers.6.mlp.up_proj.bias", "model.layers.7.self_attn.k_proj.bias", "model.layers.7.self_attn.o_proj.bias", "model.layers.7.self_attn.q_proj.bias", "model.layers.7.self_attn.v_proj.bias", "model.layers.7.mlp.down_proj.bias", "model.layers.7.mlp.gate_proj.bias", "model.layers.7.mlp.up_proj.bias", "model.layers.8.self_attn.k_proj.bias", "model.layers.8.self_attn.o_proj.bias", "model.layers.8.self_attn.q_proj.bias", "model.layers.8.self_attn.v_proj.bias", "model.layers.8.mlp.down_proj.bias", "model.layers.8.mlp.gate_proj.bias", "model.layers.8.mlp.up_proj.bias", "model.layers.9.self_attn.k_proj.bias", "model.layers.9.self_attn.o_proj.bias", "model.layers.9.self_attn.q_proj.bias", "model.layers.9.self_attn.v_proj.bias", "model.layers.9.mlp.down_proj.bias", "model.layers.9.mlp.gate_proj.bias", "model.layers.9.mlp.up_proj.bias", "model.layers.10.self_attn.k_proj.bias", "model.layers.10.self_attn.o_proj.bias", "model.layers.10.self_attn.q_proj.bias", "model.layers.10.self_attn.v_proj.bias", "model.layers.10.mlp.down_proj.bias", "model.layers.10.mlp.gate_proj.bias", "model.layers.10.mlp.up_proj.bias", "model.layers.11.self_attn.k_proj.bias", "model.layers.11.self_attn.o_proj.bias", "model.layers.11.self_attn.q_proj.bias", "model.layers.11.self_attn.v_proj.bias", "model.layers.11.mlp.down_proj.bias", "model.layers.11.mlp.gate_proj.bias", "model.layers.11.mlp.up_proj.bias", "model.layers.12.self_attn.k_proj.bias", "model.layers.12.self_attn.o_proj.bias", "model.layers.12.self_attn.q_proj.bias", "model.layers.12.self_attn.v_proj.bias", "model.layers.12.mlp.down_proj.bias", "model.layers.12.mlp.gate_proj.bias", "model.layers.12.mlp.up_proj.bias", "model.layers.13.self_attn.k_proj.bias", "model.layers.13.self_attn.o_proj.bias", "model.layers.13.self_attn.q_proj.bias", "model.layers.13.self_attn.v_proj.bias", "model.layers.13.mlp.down_proj.bias", "model.layers.13.mlp.gate_proj.bias", "model.layers.13.mlp.up_proj.bias", "model.layers.14.self_attn.k_proj.bias", "model.layers.14.self_attn.o_proj.bias", "model.layers.14.self_attn.q_proj.bias", "model.layers.14.self_attn.v_proj.bias", "model.layers.14.mlp.down_proj.bias", "model.layers.14.mlp.gate_proj.bias", "model.layers.14.mlp.up_proj.bias", "model.layers.15.self_attn.k_proj.bias", "model.layers.15.self_attn.o_proj.bias", "model.layers.15.self_attn.q_proj.bias", "model.layers.15.self_attn.v_proj.bias", "model.layers.15.mlp.down_proj.bias", "model.layers.15.mlp.gate_proj.bias", "model.layers.15.mlp.up_proj.bias", "model.layers.16.self_attn.k_proj.bias", "model.layers.16.self_attn.o_proj.bias", "model.layers.16.self_attn.q_proj.bias", "model.layers.16.self_attn.v_proj.bias", "model.layers.16.mlp.down_proj.bias", "model.layers.16.mlp.gate_proj.bias", "model.layers.16.mlp.up_proj.bias", "model.layers.17.self_attn.k_proj.bias", "model.layers.17.self_attn.o_proj.bias", "model.layers.17.self_attn.q_proj.bias", "model.layers.17.self_attn.v_proj.bias", "model.layers.17.mlp.down_proj.bias", "model.layers.17.mlp.gate_proj.bias", "model.layers.17.mlp.up_proj.bias", "model.layers.18.self_attn.k_proj.bias", "model.layers.18.self_attn.o_proj.bias", "model.layers.18.self_attn.q_proj.bias", "model.layers.18.self_attn.v_proj.bias", "model.layers.18.mlp.down_proj.bias", "model.layers.18.mlp.gate_proj.bias", "model.layers.18.mlp.up_proj.bias", "model.layers.19.self_attn.k_proj.bias", "model.layers.19.self_attn.o_proj.bias", "model.layers.19.self_attn.q_proj.bias", "model.layers.19.self_attn.v_proj.bias", "model.layers.19.mlp.down_proj.bias", "model.layers.19.mlp.gate_proj.bias", "model.layers.19.mlp.up_proj.bias", "model.layers.20.self_attn.k_proj.bias", "model.layers.20.self_attn.o_proj.bias", "model.layers.20.self_attn.q_proj.bias", "model.layers.20.self_attn.v_proj.bias", "model.layers.20.mlp.down_proj.bias", "model.layers.20.mlp.gate_proj.bias", "model.layers.20.mlp.up_proj.bias", "model.layers.21.self_attn.k_proj.bias", "model.layers.21.self_attn.o_proj.bias", "model.layers.21.self_attn.q_proj.bias", "model.layers.21.self_attn.v_proj.bias", "model.layers.21.mlp.down_proj.bias", "model.layers.21.mlp.gate_proj.bias", "model.layers.21.mlp.up_proj.bias", "model.layers.22.self_attn.k_proj.bias", "model.layers.22.self_attn.o_proj.bias", "model.layers.22.self_attn.q_proj.bias", "model.layers.22.self_attn.v_proj.bias", "model.layers.22.mlp.down_proj.bias", "model.layers.22.mlp.gate_proj.bias", "model.layers.22.mlp.up_proj.bias", "model.layers.23.self_attn.k_proj.bias", "model.layers.23.self_attn.o_proj.bias", "model.layers.23.self_attn.q_proj.bias", "model.layers.23.self_attn.v_proj.bias", "model.layers.23.mlp.down_proj.bias", "model.layers.23.mlp.gate_proj.bias", "model.layers.23.mlp.up_proj.bias", "model.layers.24.self_attn.k_proj.bias", "model.layers.24.self_attn.o_proj.bias", "model.layers.24.self_attn.q_proj.bias", "model.layers.24.self_attn.v_proj.bias", "model.layers.24.mlp.down_proj.bias", "model.layers.24.mlp.gate_proj.bias", "model.layers.24.mlp.up_proj.bias", "model.layers.25.self_attn.k_proj.bias", "model.layers.25.self_attn.o_proj.bias", "model.layers.25.self_attn.q_proj.bias", "model.layers.25.self_attn.v_proj.bias", "model.layers.25.mlp.down_proj.bias", "model.layers.25.mlp.gate_proj.bias", "model.layers.25.mlp.up_proj.bias", "model.layers.26.self_attn.k_proj.bias", "model.layers.26.self_attn.o_proj.bias", "model.layers.26.self_attn.q_proj.bias", "model.layers.26.self_attn.v_proj.bias", "model.layers.26.mlp.down_proj.bias", "model.layers.26.mlp.gate_proj.bias", "model.layers.26.mlp.up_proj.bias", "model.layers.27.self_attn.k_proj.bias", "model.layers.27.self_attn.o_proj.bias", "model.layers.27.self_attn.q_proj.bias", "model.layers.27.self_attn.v_proj.bias", "model.layers.27.mlp.down_proj.bias", "model.layers.27.mlp.gate_proj.bias", "model.layers.27.mlp.up_proj.bias", "model.layers.28.self_attn.k_proj.bias", "model.layers.28.self_attn.o_proj.bias", "model.layers.28.self_attn.q_proj.bias", "model.layers.28.self_attn.v_proj.bias", "model.layers.28.mlp.down_proj.bias", "model.layers.28.mlp.gate_proj.bias", "model.layers.28.mlp.up_proj.bias", "model.layers.29.self_attn.k_proj.bias", "model.layers.29.self_attn.o_proj.bias", "model.layers.29.self_attn.q_proj.bias", "model.layers.29.self_attn.v_proj.bias", "model.layers.29.mlp.down_proj.bias", "model.layers.29.mlp.gate_proj.bias", "model.layers.29.mlp.up_proj.bias", "model.layers.30.self_attn.k_proj.bias", "model.layers.30.self_attn.o_proj.bias", "model.layers.30.self_attn.q_proj.bias", "model.layers.30.self_attn.v_proj.bias", "model.layers.30.mlp.down_proj.bias", "model.layers.30.mlp.gate_proj.bias", "model.layers.30.mlp.up_proj.bias", "model.layers.31.self_attn.k_proj.bias", "model.layers.31.self_attn.o_proj.bias", "model.layers.31.self_attn.q_proj.bias", "model.layers.31.self_attn.v_proj.bias", "model.layers.31.mlp.down_proj.bias", "model.layers.31.mlp.gate_proj.bias", "model.layers.31.mlp.up_proj.bias", "model.layers.32.self_attn.k_proj.bias", "model.layers.32.self_attn.o_proj.bias", "model.layers.32.self_attn.q_proj.bias", "model.layers.32.self_attn.v_proj.bias", "model.layers.32.mlp.down_proj.bias", "model.layers.32.mlp.gate_proj.bias", "model.layers.32.mlp.up_proj.bias", "model.layers.33.self_attn.k_proj.bias", "model.layers.33.self_attn.o_proj.bias", "model.layers.33.self_attn.q_proj.bias", "model.layers.33.self_attn.v_proj.bias", "model.layers.33.mlp.down_proj.bias", "model.layers.33.mlp.gate_proj.bias", "model.layers.33.mlp.up_proj.bias", "model.layers.34.self_attn.k_proj.bias", "model.layers.34.self_attn.o_proj.bias", "model.layers.34.self_attn.q_proj.bias", "model.layers.34.self_attn.v_proj.bias", "model.layers.34.mlp.down_proj.bias", "model.layers.34.mlp.gate_proj.bias", "model.layers.34.mlp.up_proj.bias", "model.layers.35.self_attn.k_proj.bias", "model.layers.35.self_attn.o_proj.bias", "model.layers.35.self_attn.q_proj.bias", "model.layers.35.self_attn.v_proj.bias", "model.layers.35.mlp.down_proj.bias", "model.layers.35.mlp.gate_proj.bias", "model.layers.35.mlp.up_proj.bias", "model.layers.36.self_attn.k_proj.bias", "model.layers.36.self_attn.o_proj.bias", "model.layers.36.self_attn.q_proj.bias", "model.layers.36.self_attn.v_proj.bias", "model.layers.36.mlp.down_proj.bias", "model.layers.36.mlp.gate_proj.bias", "model.layers.36.mlp.up_proj.bias", "model.layers.37.self_attn.k_proj.bias", "model.layers.37.self_attn.o_proj.bias", "model.layers.37.self_attn.q_proj.bias", "model.layers.37.self_attn.v_proj.bias", "model.layers.37.mlp.down_proj.bias", "model.layers.37.mlp.gate_proj.bias", "model.layers.37.mlp.up_proj.bias", "model.layers.38.self_attn.k_proj.bias", "model.layers.38.self_attn.o_proj.bias", "model.layers.38.self_attn.q_proj.bias", "model.layers.38.self_attn.v_proj.bias", "model.layers.38.mlp.down_proj.bias", "model.layers.38.mlp.gate_proj.bias", "model.layers.38.mlp.up_proj.bias", "model.layers.39.self_attn.k_proj.bias", "model.layers.39.self_attn.o_proj.bias", "model.layers.39.self_attn.q_proj.bias", "model.layers.39.self_attn.v_proj.bias", "model.layers.39.mlp.down_proj.bias", "model.layers.39.mlp.gate_proj.bias", "model.layers.39.mlp.up_proj.bias". Unexpected key(s) in state_dict: "model.layers.0.self_attn.k_proj.g_idx", "model.layers.0.self_attn.o_proj.g_idx", "model.layers.0.self_attn.q_proj.g_idx", "model.layers.0.self_attn.v_proj.g_idx", "model.layers.0.mlp.down_proj.g_idx", "model.layers.0.mlp.gate_proj.g_idx", "model.layers.0.mlp.up_proj.g_idx", "model.layers.1.self_attn.k_proj.g_idx", "model.layers.1.self_attn.o_proj.g_idx", "model.layers.1.self_attn.q_proj.g_idx", "model.layers.1.self_attn.v_proj.g_idx", "model.layers.1.mlp.down_proj.g_idx", "model.layers.1.mlp.gate_proj.g_idx", "model.layers.1.mlp.up_proj.g_idx", "model.layers.2.self_attn.k_proj.g_idx", "model.layers.2.self_attn.o_proj.g_idx", "model.layers.2.self_attn.q_proj.g_idx", "model.layers.2.self_attn.v_proj.g_idx", "model.layers.2.mlp.down_proj.g_idx", "model.layers.2.mlp.gate_proj.g_idx", "model.layers.2.mlp.up_proj.g_idx", "model.layers.3.self_attn.k_proj.g_idx", "model.layers.3.self_attn.o_proj.g_idx", "model.layers.3.self_attn.q_proj.g_idx", "model.layers.3.self_attn.v_proj.g_idx", "model.layers.3.mlp.down_proj.g_idx", "model.layers.3.mlp.gate_proj.g_idx", "model.layers.3.mlp.up_proj.g_idx", "model.layers.4.self_attn.k_proj.g_idx", "model.layers.4.self_attn.o_proj.g_idx", "model.layers.4.self_attn.q_proj.g_idx", "model.layers.4.self_attn.v_proj.g_idx", "model.layers.4.mlp.down_proj.g_idx", "model.layers.4.mlp.gate_proj.g_idx", "model.layers.4.mlp.up_proj.g_idx", "model.layers.5.self_attn.k_proj.g_idx", "model.layers.5.self_attn.o_proj.g_idx", "model.layers.5.self_attn.q_proj.g_idx", "model.layers.5.self_attn.v_proj.g_idx", "model.layers.5.mlp.down_proj.g_idx", "model.layers.5.mlp.gate_proj.g_idx", "model.layers.5.mlp.up_proj.g_idx", "model.layers.6.self_attn.k_proj.g_idx", "model.layers.6.self_attn.o_proj.g_idx", "model.layers.6.self_attn.q_proj.g_idx", "model.layers.6.self_attn.v_proj.g_idx", "model.layers.6.mlp.down_proj.g_idx", "model.layers.6.mlp.gate_proj.g_idx", "model.layers.6.mlp.up_proj.g_idx", "model.layers.7.self_attn.k_proj.g_idx", "model.layers.7.self_attn.o_proj.g_idx", "model.layers.7.self_attn.q_proj.g_idx", "model.layers.7.self_attn.v_proj.g_idx", "model.layers.7.mlp.down_proj.g_idx", "model.layers.7.mlp.gate_proj.g_idx", "model.layers.7.mlp.up_proj.g_idx", "model.layers.8.self_attn.k_proj.g_idx", "model.layers.8.self_attn.o_proj.g_idx", "model.layers.8.self_attn.q_proj.g_idx", "model.layers.8.self_attn.v_proj.g_idx", "model.layers.8.mlp.down_proj.g_idx", "model.layers.8.mlp.gate_proj.g_idx", "model.layers.8.mlp.up_proj.g_idx", "model.layers.9.self_attn.k_proj.g_idx", "model.layers.9.self_attn.o_proj.g_idx", "model.layers.9.self_attn.q_proj.g_idx", "model.layers.9.self_attn.v_proj.g_idx", "model.layers.9.mlp.down_proj.g_idx", "model.layers.9.mlp.gate_proj.g_idx", "model.layers.9.mlp.up_proj.g_idx", "model.layers.10.self_attn.k_proj.g_idx", "model.layers.10.self_attn.o_proj.g_idx", "model.layers.10.self_attn.q_proj.g_idx", "model.layers.10.self_attn.v_proj.g_idx", "model.layers.10.mlp.down_proj.g_idx", "model.layers.10.mlp.gate_proj.g_idx", "model.layers.10.mlp.up_proj.g_idx", "model.layers.11.self_attn.k_proj.g_idx", "model.layers.11.self_attn.o_proj.g_idx", "model.layers.11.self_attn.q_proj.g_idx", "model.layers.11.self_attn.v_proj.g_idx", "model.layers.11.mlp.down_proj.g_idx", "model.layers.11.mlp.gate_proj.g_idx", "model.layers.11.mlp.up_proj.g_idx", "model.layers.12.self_attn.k_proj.g_idx", "model.layers.12.self_attn.o_proj.g_idx", "model.layers.12.self_attn.q_proj.g_idx", "model.layers.12.self_attn.v_proj.g_idx", "model.layers.12.mlp.down_proj.g_idx", "model.layers.12.mlp.gate_proj.g_idx", "model.layers.12.mlp.up_proj.g_idx", "model.layers.13.self_attn.k_proj.g_idx", "model.layers.13.self_attn.o_proj.g_idx", "model.layers.13.self_attn.q_proj.g_idx", "model.layers.13.self_attn.v_proj.g_idx", "model.layers.13.mlp.down_proj.g_idx", "model.layers.13.mlp.gate_proj.g_idx", "model.layers.13.mlp.up_proj.g_idx", "model.layers.14.self_attn.k_proj.g_idx", "model.layers.14.self_attn.o_proj.g_idx", "model.layers.14.self_attn.q_proj.g_idx", "model.layers.14.self_attn.v_proj.g_idx", "model.layers.14.mlp.down_proj.g_idx", "model.layers.14.mlp.gate_proj.g_idx", "model.layers.14.mlp.up_proj.g_idx", "model.layers.15.self_attn.k_proj.g_idx", "model.layers.15.self_attn.o_proj.g_idx", "model.layers.15.self_attn.q_proj.g_idx", "model.layers.15.self_attn.v_proj.g_idx", "model.layers.15.mlp.down_proj.g_idx", "model.layers.15.mlp.gate_proj.g_idx", "model.layers.15.mlp.up_proj.g_idx", "model.layers.16.self_attn.k_proj.g_idx", "model.layers.16.self_attn.o_proj.g_idx", "model.layers.16.self_attn.q_proj.g_idx", "model.layers.16.self_attn.v_proj.g_idx", "model.layers.16.mlp.down_proj.g_idx", "model.layers.16.mlp.gate_proj.g_idx", "model.layers.16.mlp.up_proj.g_idx", "model.layers.17.self_attn.k_proj.g_idx", "model.layers.17.self_attn.o_proj.g_idx", "model.layers.17.self_attn.q_proj.g_idx", "model.layers.17.self_attn.v_proj.g_idx", "model.layers.17.mlp.down_proj.g_idx", "model.layers.17.mlp.gate_proj.g_idx", "model.layers.17.mlp.up_proj.g_idx", "model.layers.18.self_attn.k_proj.g_idx", "model.layers.18.self_attn.o_proj.g_idx", "model.layers.18.self_attn.q_proj.g_idx", "model.layers.18.self_attn.v_proj.g_idx", "model.layers.18.mlp.down_proj.g_idx", "model.layers.18.mlp.gate_proj.g_idx", "model.layers.18.mlp.up_proj.g_idx", "model.layers.19.self_attn.k_proj.g_idx", "model.layers.19.self_attn.o_proj.g_idx", "model.layers.19.self_attn.q_proj.g_idx", "model.layers.19.self_attn.v_proj.g_idx", "model.layers.19.mlp.down_proj.g_idx", "model.layers.19.mlp.gate_proj.g_idx", "model.layers.19.mlp.up_proj.g_idx", "model.layers.20.self_attn.k_proj.g_idx", "model.layers.20.self_attn.o_proj.g_idx", "model.layers.20.self_attn.q_proj.g_idx", "model.layers.20.self_attn.v_proj.g_idx", "model.layers.20.mlp.down_proj.g_idx", "model.layers.20.mlp.gate_proj.g_idx", "model.layers.20.mlp.up_proj.g_idx", "model.layers.21.self_attn.k_proj.g_idx", "model.layers.21.self_attn.o_proj.g_idx", "model.layers.21.self_attn.q_proj.g_idx", "model.layers.21.self_attn.v_proj.g_idx", "model.layers.21.mlp.down_proj.g_idx", "model.layers.21.mlp.gate_proj.g_idx", "model.layers.21.mlp.up_proj.g_idx", "model.layers.22.self_attn.k_proj.g_idx", "model.layers.22.self_attn.o_proj.g_idx", "model.layers.22.self_attn.q_proj.g_idx", "model.layers.22.self_attn.v_proj.g_idx", "model.layers.22.mlp.down_proj.g_idx", "model.layers.22.mlp.gate_proj.g_idx", "model.layers.22.mlp.up_proj.g_idx", "model.layers.23.self_attn.k_proj.g_idx", "model.layers.23.self_attn.o_proj.g_idx", "model.layers.23.self_attn.q_proj.g_idx", "model.layers.23.self_attn.v_proj.g_idx", "model.layers.23.mlp.down_proj.g_idx", "model.layers.23.mlp.gate_proj.g_idx", "model.layers.23.mlp.up_proj.g_idx", "model.layers.24.self_attn.k_proj.g_idx", "model.layers.24.self_attn.o_proj.g_idx", "model.layers.24.self_attn.q_proj.g_idx", "model.layers.24.self_attn.v_proj.g_idx", "model.layers.24.mlp.down_proj.g_idx", "model.layers.24.mlp.gate_proj.g_idx", "model.layers.24.mlp.up_proj.g_idx", "model.layers.25.self_attn.k_proj.g_idx", "model.layers.25.self_attn.o_proj.g_idx", "model.layers.25.self_attn.q_proj.g_idx", "model.layers.25.self_attn.v_proj.g_idx", "model.layers.25.mlp.down_proj.g_idx", "model.layers.25.mlp.gate_proj.g_idx", "model.layers.25.mlp.up_proj.g_idx", "model.layers.26.self_attn.k_proj.g_idx", "model.layers.26.self_attn.o_proj.g_idx", "model.layers.26.self_attn.q_proj.g_idx", "model.layers.26.self_attn.v_proj.g_idx", "model.layers.26.mlp.down_proj.g_idx", "model.layers.26.mlp.gate_proj.g_idx", "model.layers.26.mlp.up_proj.g_idx", "model.layers.27.self_attn.k_proj.g_idx", "model.layers.27.self_attn.o_proj.g_idx", "model.layers.27.self_attn.q_proj.g_idx", "model.layers.27.self_attn.v_proj.g_idx", "model.layers.27.mlp.down_proj.g_idx", "model.layers.27.mlp.gate_proj.g_idx", "model.layers.27.mlp.up_proj.g_idx", "model.layers.28.self_attn.k_proj.g_idx", "model.layers.28.self_attn.o_proj.g_idx", "model.layers.28.self_attn.q_proj.g_idx", "model.layers.28.self_attn.v_proj.g_idx", "model.layers.28.mlp.down_proj.g_idx", "model.layers.28.mlp.gate_proj.g_idx", "model.layers.28.mlp.up_proj.g_idx", "model.layers.29.self_attn.k_proj.g_idx", "model.layers.29.self_attn.o_proj.g_idx", "model.layers.29.self_attn.q_proj.g_idx", "model.layers.29.self_attn.v_proj.g_idx", "model.layers.29.mlp.down_proj.g_idx", "model.layers.29.mlp.gate_proj.g_idx", "model.layers.29.mlp.up_proj.g_idx", "model.layers.30.self_attn.k_proj.g_idx", "model.layers.30.self_attn.o_proj.g_idx", "model.layers.30.self_attn.q_proj.g_idx", "model.layers.30.self_attn.v_proj.g_idx", "model.layers.30.mlp.down_proj.g_idx", "model.layers.30.mlp.gate_proj.g_idx", "model.layers.30.mlp.up_proj.g_idx", "model.layers.31.self_attn.k_proj.g_idx", "model.layers.31.self_attn.o_proj.g_idx", "model.layers.31.self_attn.q_proj.g_idx", "model.layers.31.self_attn.v_proj.g_idx", "model.layers.31.mlp.down_proj.g_idx", "model.layers.31.mlp.gate_proj.g_idx", "model.layers.31.mlp.up_proj.g_idx", "model.layers.32.self_attn.k_proj.g_idx", "model.layers.32.self_attn.o_proj.g_idx", "model.layers.32.self_attn.q_proj.g_idx", "model.layers.32.self_attn.v_proj.g_idx", "model.layers.32.mlp.down_proj.g_idx", "model.layers.32.mlp.gate_proj.g_idx", "model.layers.32.mlp.up_proj.g_idx", "model.layers.33.self_attn.k_proj.g_idx", "model.layers.33.self_attn.o_proj.g_idx", "model.layers.33.self_attn.q_proj.g_idx", "model.layers.33.self_attn.v_proj.g_idx", "model.layers.33.mlp.down_proj.g_idx", "model.layers.33.mlp.gate_proj.g_idx", "model.layers.33.mlp.up_proj.g_idx", "model.layers.34.self_attn.k_proj.g_idx", "model.layers.34.self_attn.o_proj.g_idx", "model.layers.34.self_attn.q_proj.g_idx", "model.layers.34.self_attn.v_proj.g_idx", "model.layers.34.mlp.down_proj.g_idx", "model.layers.34.mlp.gate_proj.g_idx", "model.layers.34.mlp.up_proj.g_idx", "model.layers.35.self_attn.k_proj.g_idx", "model.layers.35.self_attn.o_proj.g_idx", "model.layers.35.self_attn.q_proj.g_idx", "model.layers.35.self_attn.v_proj.g_idx", "model.layers.35.mlp.down_proj.g_idx", "model.layers.35.mlp.gate_proj.g_idx", "model.layers.35.mlp.up_proj.g_idx", "model.layers.36.self_attn.k_proj.g_idx", "model.layers.36.self_attn.o_proj.g_idx", "model.layers.36.self_attn.q_proj.g_idx", "model.layers.36.self_attn.v_proj.g_idx", "model.layers.36.mlp.down_proj.g_idx", "model.layers.36.mlp.gate_proj.g_idx", "model.layers.36.mlp.up_proj.g_idx", "model.layers.37.self_attn.k_proj.g_idx", "model.layers.37.self_attn.o_proj.g_idx", "model.layers.37.self_attn.q_proj.g_idx", "model.layers.37.self_attn.v_proj.g_idx", "model.layers.37.mlp.down_proj.g_idx", "model.layers.37.mlp.gate_proj.g_idx", "model.layers.37.mlp.up_proj.g_idx", "model.layers.38.self_attn.k_proj.g_idx", "model.layers.38.self_attn.o_proj.g_idx", "model.layers.38.self_attn.q_proj.g_idx", "model.layers.38.self_attn.v_proj.g_idx", "model.layers.38.mlp.down_proj.g_idx", "model.layers.38.mlp.gate_proj.g_idx", "model.layers.38.mlp.up_proj.g_idx", "model.layers.39.self_attn.k_proj.g_idx", "model.layers.39.self_attn.o_proj.g_idx", "model.layers.39.self_attn.q_proj.g_idx", "model.layers.39.self_attn.v_proj.g_idx", "model.layers.39.mlp.down_proj.g_idx", "model.layers.39.mlp.gate_proj.g_idx", "model.layers.39.mlp.up_proj.g_idx". Press any key to continue . . .
Same issue with https://huggingface.co/reeducator/vicuna-13b-free
Same issue here. I have tried conbinations of the following flags to load it at start --wbits 4 --groupsize 128 --pre_layer 32 --load-in-8bit --model TheBloke_stable-vicuna-13B-GPTQ - but no luck.
12GB GPU 64GB system RAM
Metered connection here, not keen to DL the 7GB file again. Has anyone tried deleting and downloading it a second time? Could it be corrupted?
I tried installing and uninstalling it multiple i get the same issue every time
Same.
2070 8gb Q-D 32 Rams
Same issue here.
Has anyone found a solution? What is even causing the issue?
Looks like it's caused by using pre_layer. No solution yet that I'm aware of.
@oobabooga -sama...
I have the same problem. When I changed the pre_layer flag it gaves this error until I restart the command window again.
AttributeError: 'Offload_LlamaModel' object has no attribute 'preload'
This only happens in the quantized models such as:
- gpt4-x-alpaca-13b-native-4bit-128g
- TheBloke_vicuna-13B-1.1-GPTQ-4bit-128g
- ausboss_WizardLM-13B-Uncensored-4bit-128g
I get the error only with "TheBloke" ausboss Wizard works very well. It works even with pre_layer as a startup argument.
Cheers
FOUND THE SOLUTION :
- Go to "TextGen-WebUI\text-generation-webui\repositories\GPTQ-for-LLaMa"
- Open this file in a text editor : "llama_inference_offload.py"
- Go to line 21, and create a new line below it
- Type this in : "self.preload = [preload value, i put 32]"
- Save file and restart the webui
@TaxDodger
Can you upload the entire python file, as for us with different GPTQ version to see exactly what you mean?
Also, did you also have the issue with Missing key(s) in state_dict
, or only with AttributeError: 'Offload_LlamaModel' object has no attribute 'preload'
? Because I am experiencing both.
Thanks in advance!
Also, ooba-sama, please fix.
@hpnyaggerman
Sure here it is : https://pastebin.com/raw/PzbddAG6 You don't really need it though, just add that one line of code and it should resolve it And yes, I also got the missing keys error and the attribute error. I got the missing keys error when I tried loading in the model through the selection screen in the webui console menu, and I got the attribute error when I tried to talk to the AI. But AFAIK this should fix it for everyone having this issue
FOUND THE SOLUTION :
1. Go to "TextGen-WebUI\text-generation-webui\repositories\GPTQ-for-LLaMa" 2. Open this file in a text editor : "llama_inference_offload.py" 3. Go to line 21, and create a new line below it 4. Type this in : "self.preload = [preload value, i put 32]" 5. Save file and restart the webui
Tried this but I still got the error when loading the model with --pre_layer 30
@boricuapab I didn't use --pre-layer, or, well, kept it at 0, and it worked for me.
FOUND THE SOLUTION :
1. Go to "TextGen-WebUI\text-generation-webui\repositories\GPTQ-for-LLaMa" 2. Open this file in a text editor : "llama_inference_offload.py" 3. Go to line 21, and create a new line below it 4. Type this in : "self.preload = [preload value, i put 32]" 5. Save file and restart the webui
Tried this but I still got the error when loading the model with --pre_layer 30
Try to remove the --pre_player flag first, then try again. If that doesnt work, delete the GPTQ-for-LLaMa folder (found in repositories) and run the windows update bat file and do the same steps again.
Yes it loads, without the --pre_layer flag, but I run out of ram after my input as I only have an 8gb vram card, which is why I need to use the --pre_layer flag. But the bloke responded in his HF page as to why pre-layer doesnt work with his stable vicuna 13B.
I am able to use the bloke's wizard vicuna 13 b uncensored model using the pre layer flag.
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.