ChatGLM-6B
ChatGLM-6B copied to clipboard
换了新的模型和配置文件后,不生成内容呢
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
换了新的模型和配置文件后,不生成内容呢 .bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.19.attention.query_key_value.bias', 'layers.19.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.bias', 'layers.6.attention.query_key_value.bias', 'layers.12.attention.query_key_value.bias', 'layers.5.attention.dense.weight', 'layers.17.attention.query_key_value.weight', 'layers.12.input_layernorm.weight', 'layers.15.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.16.mlp.dense_h_to_4h.weight', 'layers.2.attention.query_key_value.weight', 'layers.14.input_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.16.attention.dense.weight', 'layers.20.attention.dense.bias', 'layers.7.attention.query_key_value.bias', 'layers.13.mlp.dense_4h_to_h.weight', 'layers.7.input_layernorm.weight', 'layers.5.attention.query_key_value.weight', 'layers.25.attention.query_key_value.weight', 'layers.14.input_layernorm.weight', 'layers.16.attention.rotary_emb.inv_freq', 'layers.25.attention.dense.weight', 'layers.26.attention.dense.weight', 'layers.12.attention.query_key_value.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.22.attention.query_key_value.bias', 'layers.6.input_layernorm.bias', 'layers.1.attention.dense.bias', 'layers.11.post_attention_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.5.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'word_embeddings.weight', 'layers.13.mlp.dense_h_to_4h.weight', 'layers.10.input_layernorm.weight', 'layers.18.attention.dense.bias', 'layers.24.post_attention_layernorm.bias', 'layers.3.attention.query_key_value.weight', 'layers.10.attention.rotary_emb.inv_freq', 'layers.12.attention.rotary_emb.inv_freq', 'layers.16.attention.query_key_value.weight', 'layers.23.input_layernorm.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.19.mlp.dense_h_to_4h.bias', 'layers.18.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.22.mlp.dense_h_to_4h.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.14.attention.dense.weight', 'layers.4.attention.rotary_emb.inv_freq', 'layers.17.input_layernorm.weight', 'layers.2.input_layernorm.weight', 'layers.6.attention.dense.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.21.mlp.dense_4h_to_h.bias', 'layers.4.post_attention_layernorm.weight', 'layers.21.mlp.dense_4h_to_h.weight', 'layers.27.input_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.5.input_layernorm.bias', 'layers.1.attention.dense.weight', 'final_layernorm.bias', 'layers.1.attention.query_key_value.bias', 'layers.2.attention.query_key_value.bias', 'layers.10.attention.dense.bias', 'layers.15.attention.dense.weight', 'layers.16.input_layernorm.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.12.post_attention_layernorm.weight', 'layers.17.mlp.dense_h_to_4h.weight', 'layers.17.mlp.dense_4h_to_h.bias', 'layers.8.attention.query_key_value.weight', 'layers.17.post_attention_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.21.attention.dense.weight', 'layers.27.mlp.dense_4h_to_h.bias', 'layers.9.attention.dense.bias', 'layers.19.attention.rotary_emb.inv_freq', 'layers.13.attention.rotary_emb.inv_freq', 'layers.5.attention.query_key_value.bias', 'layers.8.input_layernorm.bias', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.20.attention.query_key_value.weight', 'layers.25.attention.query_key_value.bias', 'layers.23.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.22.input_layernorm.bias', 'layers.16.post_attention_layernorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/thudm/.local/lib/python3.7/site-packages/peft/tuners/lora.py:174: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. "fan_in_fan_out is set to True but the target module is not a Conv1D. " user:你好啊 The dtype of attention mask (torch.int64) is not bool bot:
Expected Behavior
换了新的模型和配置文件后,不生成内容呢 .bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.19.attention.query_key_value.bias', 'layers.19.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.bias', 'layers.6.attention.query_key_value.bias', 'layers.12.attention.query_key_value.bias', 'layers.5.attention.dense.weight', 'layers.17.attention.query_key_value.weight', 'layers.12.input_layernorm.weight', 'layers.15.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.16.mlp.dense_h_to_4h.weight', 'layers.2.attention.query_key_value.weight', 'layers.14.input_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.16.attention.dense.weight', 'layers.20.attention.dense.bias', 'layers.7.attention.query_key_value.bias', 'layers.13.mlp.dense_4h_to_h.weight', 'layers.7.input_layernorm.weight', 'layers.5.attention.query_key_value.weight', 'layers.25.attention.query_key_value.weight', 'layers.14.input_layernorm.weight', 'layers.16.attention.rotary_emb.inv_freq', 'layers.25.attention.dense.weight', 'layers.26.attention.dense.weight', 'layers.12.attention.query_key_value.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.22.attention.query_key_value.bias', 'layers.6.input_layernorm.bias', 'layers.1.attention.dense.bias', 'layers.11.post_attention_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.5.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'word_embeddings.weight', 'layers.13.mlp.dense_h_to_4h.weight', 'layers.10.input_layernorm.weight', 'layers.18.attention.dense.bias', 'layers.24.post_attention_layernorm.bias', 'layers.3.attention.query_key_value.weight', 'layers.10.attention.rotary_emb.inv_freq', 'layers.12.attention.rotary_emb.inv_freq', 'layers.16.attention.query_key_value.weight', 'layers.23.input_layernorm.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.19.mlp.dense_h_to_4h.bias', 'layers.18.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.22.mlp.dense_h_to_4h.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.14.attention.dense.weight', 'layers.4.attention.rotary_emb.inv_freq', 'layers.17.input_layernorm.weight', 'layers.2.input_layernorm.weight', 'layers.6.attention.dense.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.21.mlp.dense_4h_to_h.bias', 'layers.4.post_attention_layernorm.weight', 'layers.21.mlp.dense_4h_to_h.weight', 'layers.27.input_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.5.input_layernorm.bias', 'layers.1.attention.dense.weight', 'final_layernorm.bias', 'layers.1.attention.query_key_value.bias', 'layers.2.attention.query_key_value.bias', 'layers.10.attention.dense.bias', 'layers.15.attention.dense.weight', 'layers.16.input_layernorm.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.12.post_attention_layernorm.weight', 'layers.17.mlp.dense_h_to_4h.weight', 'layers.17.mlp.dense_4h_to_h.bias', 'layers.8.attention.query_key_value.weight', 'layers.17.post_attention_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.21.attention.dense.weight', 'layers.27.mlp.dense_4h_to_h.bias', 'layers.9.attention.dense.bias', 'layers.19.attention.rotary_emb.inv_freq', 'layers.13.attention.rotary_emb.inv_freq', 'layers.5.attention.query_key_value.bias', 'layers.8.input_layernorm.bias', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.20.attention.query_key_value.weight', 'layers.25.attention.query_key_value.bias', 'layers.23.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.22.input_layernorm.bias', 'layers.16.post_attention_layernorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/thudm/.local/lib/python3.7/site-packages/peft/tuners/lora.py:174: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. "fan_in_fan_out is set to True but the target module is not a Conv1D. " user:你好啊 The dtype of attention mask (torch.int64) is not bool bot:
Steps To Reproduce
换了新的模型和配置文件后,不生成内容呢 .bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.19.attention.query_key_value.bias', 'layers.19.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.bias', 'layers.6.attention.query_key_value.bias', 'layers.12.attention.query_key_value.bias', 'layers.5.attention.dense.weight', 'layers.17.attention.query_key_value.weight', 'layers.12.input_layernorm.weight', 'layers.15.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.16.mlp.dense_h_to_4h.weight', 'layers.2.attention.query_key_value.weight', 'layers.14.input_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.16.attention.dense.weight', 'layers.20.attention.dense.bias', 'layers.7.attention.query_key_value.bias', 'layers.13.mlp.dense_4h_to_h.weight', 'layers.7.input_layernorm.weight', 'layers.5.attention.query_key_value.weight', 'layers.25.attention.query_key_value.weight', 'layers.14.input_layernorm.weight', 'layers.16.attention.rotary_emb.inv_freq', 'layers.25.attention.dense.weight', 'layers.26.attention.dense.weight', 'layers.12.attention.query_key_value.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.22.attention.query_key_value.bias', 'layers.6.input_layernorm.bias', 'layers.1.attention.dense.bias', 'layers.11.post_attention_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.5.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'word_embeddings.weight', 'layers.13.mlp.dense_h_to_4h.weight', 'layers.10.input_layernorm.weight', 'layers.18.attention.dense.bias', 'layers.24.post_attention_layernorm.bias', 'layers.3.attention.query_key_value.weight', 'layers.10.attention.rotary_emb.inv_freq', 'layers.12.attention.rotary_emb.inv_freq', 'layers.16.attention.query_key_value.weight', 'layers.23.input_layernorm.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.19.mlp.dense_h_to_4h.bias', 'layers.18.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.22.mlp.dense_h_to_4h.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.14.attention.dense.weight', 'layers.4.attention.rotary_emb.inv_freq', 'layers.17.input_layernorm.weight', 'layers.2.input_layernorm.weight', 'layers.6.attention.dense.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.21.mlp.dense_4h_to_h.bias', 'layers.4.post_attention_layernorm.weight', 'layers.21.mlp.dense_4h_to_h.weight', 'layers.27.input_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.5.input_layernorm.bias', 'layers.1.attention.dense.weight', 'final_layernorm.bias', 'layers.1.attention.query_key_value.bias', 'layers.2.attention.query_key_value.bias', 'layers.10.attention.dense.bias', 'layers.15.attention.dense.weight', 'layers.16.input_layernorm.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.12.post_attention_layernorm.weight', 'layers.17.mlp.dense_h_to_4h.weight', 'layers.17.mlp.dense_4h_to_h.bias', 'layers.8.attention.query_key_value.weight', 'layers.17.post_attention_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.21.attention.dense.weight', 'layers.27.mlp.dense_4h_to_h.bias', 'layers.9.attention.dense.bias', 'layers.19.attention.rotary_emb.inv_freq', 'layers.13.attention.rotary_emb.inv_freq', 'layers.5.attention.query_key_value.bias', 'layers.8.input_layernorm.bias', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.20.attention.query_key_value.weight', 'layers.25.attention.query_key_value.bias', 'layers.23.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.22.input_layernorm.bias', 'layers.16.post_attention_layernorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/thudm/.local/lib/python3.7/site-packages/peft/tuners/lora.py:174: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. "fan_in_fan_out is set to True but the target module is not a Conv1D. " user:你好啊 The dtype of attention mask (torch.int64) is not bool bot:
Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
1换了新的模型和配置文件后,不生成内容呢 .bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.19.attention.query_key_value.bias', 'layers.19.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.bias', 'layers.6.attention.query_key_value.bias', 'layers.12.attention.query_key_value.bias', 'layers.5.attention.dense.weight', 'layers.17.attention.query_key_value.weight', 'layers.12.input_layernorm.weight', 'layers.15.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.16.mlp.dense_h_to_4h.weight', 'layers.2.attention.query_key_value.weight', 'layers.14.input_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.16.attention.dense.weight', 'layers.20.attention.dense.bias', 'layers.7.attention.query_key_value.bias', 'layers.13.mlp.dense_4h_to_h.weight', 'layers.7.input_layernorm.weight', 'layers.5.attention.query_key_value.weight', 'layers.25.attention.query_key_value.weight', 'layers.14.input_layernorm.weight', 'layers.16.attention.rotary_emb.inv_freq', 'layers.25.attention.dense.weight', 'layers.26.attention.dense.weight', 'layers.12.attention.query_key_value.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.22.attention.query_key_value.bias', 'layers.6.input_layernorm.bias', 'layers.1.attention.dense.bias', 'layers.11.post_attention_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.5.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'word_embeddings.weight', 'layers.13.mlp.dense_h_to_4h.weight', 'layers.10.input_layernorm.weight', 'layers.18.attention.dense.bias', 'layers.24.post_attention_layernorm.bias', 'layers.3.attention.query_key_value.weight', 'layers.10.attention.rotary_emb.inv_freq', 'layers.12.attention.rotary_emb.inv_freq', 'layers.16.attention.query_key_value.weight', 'layers.23.input_layernorm.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.19.mlp.dense_h_to_4h.bias', 'layers.18.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.22.mlp.dense_h_to_4h.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.14.attention.dense.weight', 'layers.4.attention.rotary_emb.inv_freq', 'layers.17.input_layernorm.weight', 'layers.2.input_layernorm.weight', 'layers.6.attention.dense.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.21.mlp.dense_4h_to_h.bias', 'layers.4.post_attention_layernorm.weight', 'layers.21.mlp.dense_4h_to_h.weight', 'layers.27.input_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.5.input_layernorm.bias', 'layers.1.attention.dense.weight', 'final_layernorm.bias', 'layers.1.attention.query_key_value.bias', 'layers.2.attention.query_key_value.bias', 'layers.10.attention.dense.bias', 'layers.15.attention.dense.weight', 'layers.16.input_layernorm.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.12.post_attention_layernorm.weight', 'layers.17.mlp.dense_h_to_4h.weight', 'layers.17.mlp.dense_4h_to_h.bias', 'layers.8.attention.query_key_value.weight', 'layers.17.post_attention_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.21.attention.dense.weight', 'layers.27.mlp.dense_4h_to_h.bias', 'layers.9.attention.dense.bias', 'layers.19.attention.rotary_emb.inv_freq', 'layers.13.attention.rotary_emb.inv_freq', 'layers.5.attention.query_key_value.bias', 'layers.8.input_layernorm.bias', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.20.attention.query_key_value.weight', 'layers.25.attention.query_key_value.bias', 'layers.23.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.22.input_layernorm.bias', 'layers.16.post_attention_layernorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/thudm/.local/lib/python3.7/site-packages/peft/tuners/lora.py:174: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. "fan_in_fan_out is set to True but the target module is not a Conv1D. " user:你好啊 The dtype of attention mask (torch.int64) is not bool bot:
You have to downgrade your version of the peft package to 0.2.0 for now. The new version is broken. See #1253
Ok, with peft package to 0.2.0, seems to work :)
You have to downgrade your version of the peft package to 0.2.0 for now. The new version is broken. See #1253
I'm facing the same issue. Just notice that ANOTHER wired bug with a lower version of peft: I installed the 0.2.0 version of peft, and it will report "Attempting to unscale FP16 gradients". What's annoying is that I uninstalled it and change it to the previous version of peft, the "Attempting to unscale FP16 gradients" also occurs.
File "/home/tiger/.local/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/tiger/.local/lib/python3.9/site-packages/transformers/trainer.py", line 1962, in _inner_training_loop
self.scaler.unscale_(self.optimizer)
File "/usr/local/lib/python3.9/dist-packages/torch/cuda/amp/grad_scaler.py", line 284, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
File "/usr/local/lib/python3.9/dist-packages/torch/cuda/amp/grad_scaler.py", line 212, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
Now it seems that there's no suitable version for me to use.
Just notice that ANOTHER wired bug with a lower version of peft
Are you sure this is caused by the peft version and not because you updated the text-generation-webui?
yes. I have encountered this problem for TWICE. Each time I didn't do any other changes at all, just installed peft==0.2.0 from 0.3.0.dev. And for that reason, I have to destroy my WHOLE environment and rebuild it for my work.
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.