ChatGLM-6B 换了新的模型和配置文件后，不生成内容呢

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

换了新的模型和配置文件后，不生成内容呢 .bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.19.attention.query_key_value.bias', 'layers.19.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.bias', 'layers.6.attention.query_key_value.bias', 'layers.12.attention.query_key_value.bias', 'layers.5.attention.dense.weight', 'layers.17.attention.query_key_value.weight', 'layers.12.input_layernorm.weight', 'layers.15.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.16.mlp.dense_h_to_4h.weight', 'layers.2.attention.query_key_value.weight', 'layers.14.input_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.16.attention.dense.weight', 'layers.20.attention.dense.bias', 'layers.7.attention.query_key_value.bias', 'layers.13.mlp.dense_4h_to_h.weight', 'layers.7.input_layernorm.weight', 'layers.5.attention.query_key_value.weight', 'layers.25.attention.query_key_value.weight', 'layers.14.input_layernorm.weight', 'layers.16.attention.rotary_emb.inv_freq', 'layers.25.attention.dense.weight', 'layers.26.attention.dense.weight', 'layers.12.attention.query_key_value.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.22.attention.query_key_value.bias', 'layers.6.input_layernorm.bias', 'layers.1.attention.dense.bias', 'layers.11.post_attention_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.5.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'word_embeddings.weight', 'layers.13.mlp.dense_h_to_4h.weight', 'layers.10.input_layernorm.weight', 'layers.18.attention.dense.bias', 'layers.24.post_attention_layernorm.bias', 'layers.3.attention.query_key_value.weight', 'layers.10.attention.rotary_emb.inv_freq', 'layers.12.attention.rotary_emb.inv_freq', 'layers.16.attention.query_key_value.weight', 'layers.23.input_layernorm.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.19.mlp.dense_h_to_4h.bias', 'layers.18.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.22.mlp.dense_h_to_4h.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.14.attention.dense.weight', 'layers.4.attention.rotary_emb.inv_freq', 'layers.17.input_layernorm.weight', 'layers.2.input_layernorm.weight', 'layers.6.attention.dense.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.21.mlp.dense_4h_to_h.bias', 'layers.4.post_attention_layernorm.weight', 'layers.21.mlp.dense_4h_to_h.weight', 'layers.27.input_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.5.input_layernorm.bias', 'layers.1.attention.dense.weight', 'final_layernorm.bias', 'layers.1.attention.query_key_value.bias', 'layers.2.attention.query_key_value.bias', 'layers.10.attention.dense.bias', 'layers.15.attention.dense.weight', 'layers.16.input_layernorm.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.12.post_attention_layernorm.weight', 'layers.17.mlp.dense_h_to_4h.weight', 'layers.17.mlp.dense_4h_to_h.bias', 'layers.8.attention.query_key_value.weight', 'layers.17.post_attention_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.21.attention.dense.weight', 'layers.27.mlp.dense_4h_to_h.bias', 'layers.9.attention.dense.bias', 'layers.19.attention.rotary_emb.inv_freq', 'layers.13.attention.rotary_emb.inv_freq', 'layers.5.attention.query_key_value.bias', 'layers.8.input_layernorm.bias', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.20.attention.query_key_value.weight', 'layers.25.attention.query_key_value.bias', 'layers.23.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.22.input_layernorm.bias', 'layers.16.post_attention_layernorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/thudm/.local/lib/python3.7/site-packages/peft/tuners/lora.py:174: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. "fan_in_fan_out is set to True but the target module is not a Conv1D. " user:你好啊 The dtype of attention mask (torch.int64) is not bool bot:

Expected Behavior

换了新的模型和配置文件后，不生成内容呢 .bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.19.attention.query_key_value.bias', 'layers.19.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.bias', 'layers.6.attention.query_key_value.bias', 'layers.12.attention.query_key_value.bias', 'layers.5.attention.dense.weight', 'layers.17.attention.query_key_value.weight', 'layers.12.input_layernorm.weight', 'layers.15.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.16.mlp.dense_h_to_4h.weight', 'layers.2.attention.query_key_value.weight', 'layers.14.input_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.16.attention.dense.weight', 'layers.20.attention.dense.bias', 'layers.7.attention.query_key_value.bias', 'layers.13.mlp.dense_4h_to_h.weight', 'layers.7.input_layernorm.weight', 'layers.5.attention.query_key_value.weight', 'layers.25.attention.query_key_value.weight', 'layers.14.input_layernorm.weight', 'layers.16.attention.rotary_emb.inv_freq', 'layers.25.attention.dense.weight', 'layers.26.attention.dense.weight', 'layers.12.attention.query_key_value.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.22.attention.query_key_value.bias', 'layers.6.input_layernorm.bias', 'layers.1.attention.dense.bias', 'layers.11.post_attention_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.5.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'word_embeddings.weight', 'layers.13.mlp.dense_h_to_4h.weight', 'layers.10.input_layernorm.weight', 'layers.18.attention.dense.bias', 'layers.24.post_attention_layernorm.bias', 'layers.3.attention.query_key_value.weight', 'layers.10.attention.rotary_emb.inv_freq', 'layers.12.attention.rotary_emb.inv_freq', 'layers.16.attention.query_key_value.weight', 'layers.23.input_layernorm.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.19.mlp.dense_h_to_4h.bias', 'layers.18.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.22.mlp.dense_h_to_4h.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.14.attention.dense.weight', 'layers.4.attention.rotary_emb.inv_freq', 'layers.17.input_layernorm.weight', 'layers.2.input_layernorm.weight', 'layers.6.attention.dense.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.21.mlp.dense_4h_to_h.bias', 'layers.4.post_attention_layernorm.weight', 'layers.21.mlp.dense_4h_to_h.weight', 'layers.27.input_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.5.input_layernorm.bias', 'layers.1.attention.dense.weight', 'final_layernorm.bias', 'layers.1.attention.query_key_value.bias', 'layers.2.attention.query_key_value.bias', 'layers.10.attention.dense.bias', 'layers.15.attention.dense.weight', 'layers.16.input_layernorm.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.12.post_attention_layernorm.weight', 'layers.17.mlp.dense_h_to_4h.weight', 'layers.17.mlp.dense_4h_to_h.bias', 'layers.8.attention.query_key_value.weight', 'layers.17.post_attention_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.21.attention.dense.weight', 'layers.27.mlp.dense_4h_to_h.bias', 'layers.9.attention.dense.bias', 'layers.19.attention.rotary_emb.inv_freq', 'layers.13.attention.rotary_emb.inv_freq', 'layers.5.attention.query_key_value.bias', 'layers.8.input_layernorm.bias', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.20.attention.query_key_value.weight', 'layers.25.attention.query_key_value.bias', 'layers.23.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.22.input_layernorm.bias', 'layers.16.post_attention_layernorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/thudm/.local/lib/python3.7/site-packages/peft/tuners/lora.py:174: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. "fan_in_fan_out is set to True but the target module is not a Conv1D. " user:你好啊 The dtype of attention mask (torch.int64) is not bool bot:

Steps To Reproduce

换了新的模型和配置文件后，不生成内容呢 .bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.19.attention.query_key_value.bias', 'layers.19.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.bias', 'layers.6.attention.query_key_value.bias', 'layers.12.attention.query_key_value.bias', 'layers.5.attention.dense.weight', 'layers.17.attention.query_key_value.weight', 'layers.12.input_layernorm.weight', 'layers.15.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.16.mlp.dense_h_to_4h.weight', 'layers.2.attention.query_key_value.weight', 'layers.14.input_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.16.attention.dense.weight', 'layers.20.attention.dense.bias', 'layers.7.attention.query_key_value.bias', 'layers.13.mlp.dense_4h_to_h.weight', 'layers.7.input_layernorm.weight', 'layers.5.attention.query_key_value.weight', 'layers.25.attention.query_key_value.weight', 'layers.14.input_layernorm.weight', 'layers.16.attention.rotary_emb.inv_freq', 'layers.25.attention.dense.weight', 'layers.26.attention.dense.weight', 'layers.12.attention.query_key_value.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.22.attention.query_key_value.bias', 'layers.6.input_layernorm.bias', 'layers.1.attention.dense.bias', 'layers.11.post_attention_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.5.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'word_embeddings.weight', 'layers.13.mlp.dense_h_to_4h.weight', 'layers.10.input_layernorm.weight', 'layers.18.attention.dense.bias', 'layers.24.post_attention_layernorm.bias', 'layers.3.attention.query_key_value.weight', 'layers.10.attention.rotary_emb.inv_freq', 'layers.12.attention.rotary_emb.inv_freq', 'layers.16.attention.query_key_value.weight', 'layers.23.input_layernorm.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.19.mlp.dense_h_to_4h.bias', 'layers.18.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.22.mlp.dense_h_to_4h.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.14.attention.dense.weight', 'layers.4.attention.rotary_emb.inv_freq', 'layers.17.input_layernorm.weight', 'layers.2.input_layernorm.weight', 'layers.6.attention.dense.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.21.mlp.dense_4h_to_h.bias', 'layers.4.post_attention_layernorm.weight', 'layers.21.mlp.dense_4h_to_h.weight', 'layers.27.input_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.5.input_layernorm.bias', 'layers.1.attention.dense.weight', 'final_layernorm.bias', 'layers.1.attention.query_key_value.bias', 'layers.2.attention.query_key_value.bias', 'layers.10.attention.dense.bias', 'layers.15.attention.dense.weight', 'layers.16.input_layernorm.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.12.post_attention_layernorm.weight', 'layers.17.mlp.dense_h_to_4h.weight', 'layers.17.mlp.dense_4h_to_h.bias', 'layers.8.attention.query_key_value.weight', 'layers.17.post_attention_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.21.attention.dense.weight', 'layers.27.mlp.dense_4h_to_h.bias', 'layers.9.attention.dense.bias', 'layers.19.attention.rotary_emb.inv_freq', 'layers.13.attention.rotary_emb.inv_freq', 'layers.5.attention.query_key_value.bias', 'layers.8.input_layernorm.bias', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.20.attention.query_key_value.weight', 'layers.25.attention.query_key_value.bias', 'layers.23.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.22.input_layernorm.bias', 'layers.16.post_attention_layernorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/thudm/.local/lib/python3.7/site-packages/peft/tuners/lora.py:174: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. "fan_in_fan_out is set to True but the target module is not a Conv1D. " user:你好啊 The dtype of attention mask (torch.int64) is not bool bot:

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

1换了新的模型和配置文件后，不生成内容呢 .bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.19.attention.query_key_value.bias', 'layers.19.post_attention_layernorm.bias', 'layers.4.post_attention_layernorm.bias', 'layers.6.attention.query_key_value.bias', 'layers.12.attention.query_key_value.bias', 'layers.5.attention.dense.weight', 'layers.17.attention.query_key_value.weight', 'layers.12.input_layernorm.weight', 'layers.15.post_attention_layernorm.bias', 'layers.8.post_attention_layernorm.weight', 'layers.16.mlp.dense_h_to_4h.weight', 'layers.2.attention.query_key_value.weight', 'layers.14.input_layernorm.bias', 'layers.1.post_attention_layernorm.weight', 'layers.16.attention.dense.weight', 'layers.20.attention.dense.bias', 'layers.7.attention.query_key_value.bias', 'layers.13.mlp.dense_4h_to_h.weight', 'layers.7.input_layernorm.weight', 'layers.5.attention.query_key_value.weight', 'layers.25.attention.query_key_value.weight', 'layers.14.input_layernorm.weight', 'layers.16.attention.rotary_emb.inv_freq', 'layers.25.attention.dense.weight', 'layers.26.attention.dense.weight', 'layers.12.attention.query_key_value.weight', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.22.attention.query_key_value.bias', 'layers.6.input_layernorm.bias', 'layers.1.attention.dense.bias', 'layers.11.post_attention_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.5.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'word_embeddings.weight', 'layers.13.mlp.dense_h_to_4h.weight', 'layers.10.input_layernorm.weight', 'layers.18.attention.dense.bias', 'layers.24.post_attention_layernorm.bias', 'layers.3.attention.query_key_value.weight', 'layers.10.attention.rotary_emb.inv_freq', 'layers.12.attention.rotary_emb.inv_freq', 'layers.16.attention.query_key_value.weight', 'layers.23.input_layernorm.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.19.mlp.dense_h_to_4h.bias', 'layers.18.mlp.dense_4h_to_h.weight', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.22.mlp.dense_h_to_4h.weight', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.14.attention.dense.weight', 'layers.4.attention.rotary_emb.inv_freq', 'layers.17.input_layernorm.weight', 'layers.2.input_layernorm.weight', 'layers.6.attention.dense.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.21.mlp.dense_4h_to_h.bias', 'layers.4.post_attention_layernorm.weight', 'layers.21.mlp.dense_4h_to_h.weight', 'layers.27.input_layernorm.weight', 'layers.11.attention.dense.bias', 'layers.5.input_layernorm.bias', 'layers.1.attention.dense.weight', 'final_layernorm.bias', 'layers.1.attention.query_key_value.bias', 'layers.2.attention.query_key_value.bias', 'layers.10.attention.dense.bias', 'layers.15.attention.dense.weight', 'layers.16.input_layernorm.bias', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.12.post_attention_layernorm.weight', 'layers.17.mlp.dense_h_to_4h.weight', 'layers.17.mlp.dense_4h_to_h.bias', 'layers.8.attention.query_key_value.weight', 'layers.17.post_attention_layernorm.weight', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.21.attention.dense.weight', 'layers.27.mlp.dense_4h_to_h.bias', 'layers.9.attention.dense.bias', 'layers.19.attention.rotary_emb.inv_freq', 'layers.13.attention.rotary_emb.inv_freq', 'layers.5.attention.query_key_value.bias', 'layers.8.input_layernorm.bias', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.20.attention.query_key_value.weight', 'layers.25.attention.query_key_value.bias', 'layers.23.input_layernorm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.22.input_layernorm.bias', 'layers.16.post_attention_layernorm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/thudm/.local/lib/python3.7/site-packages/peft/tuners/lora.py:174: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. "fan_in_fan_out is set to True but the target module is not a Conv1D. " user:你好啊 The dtype of attention mask (torch.int64) is not bool bot:

Apr 16 '23 11:04 cywjava

You have to downgrade your version of the peft package to 0.2.0 for now. The new version is broken. See #1253

Apr 16 '23 11:04 myluki2000

Ok, with peft package to 0.2.0, seems to work :)

Apr 16 '23 12:04 cibernicola

You have to downgrade your version of the peft package to 0.2.0 for now. The new version is broken. See #1253

I'm facing the same issue. Just notice that ANOTHER wired bug with a lower version of peft: I installed the 0.2.0 version of peft, and it will report "Attempting to unscale FP16 gradients". What's annoying is that I uninstalled it and change it to the previous version of peft, the "Attempting to unscale FP16 gradients" also occurs.

  File "/home/tiger/.local/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/home/tiger/.local/lib/python3.9/site-packages/transformers/trainer.py", line 1962, in _inner_training_loop
    self.scaler.unscale_(self.optimizer)
  File "/usr/local/lib/python3.9/dist-packages/torch/cuda/amp/grad_scaler.py", line 284, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
  File "/usr/local/lib/python3.9/dist-packages/torch/cuda/amp/grad_scaler.py", line 212, in _unscale_grads_
    raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.

Now it seems that there's no suitable version for me to use.

Apr 21 '23 18:04 qiguanqiang

Just notice that ANOTHER wired bug with a lower version of peft

Are you sure this is caused by the peft version and not because you updated the text-generation-webui?

Apr 21 '23 22:04 myluki2000

yes. I have encountered this problem for TWICE. Each time I didn't do any other changes at all, just installed peft==0.2.0 from 0.3.0.dev. And for that reason, I have to destroy my WHOLE environment and rebuild it for my work.

Apr 22 '23 16:04 qiguanqiang

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

May 23 '23 23:05 github-actions[bot]

ChatGLM-6B ChatGLM-6B copied to clipboard

换了新的模型和配置文件后，不生成内容呢

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

ChatGLM-6B
ChatGLM-6B copied to clipboard