ChatGLM-6B
ChatGLM-6B copied to clipboard
请问在运行p-tung/train.sh微调时是否可以冻结prefix_encoder层的全部参数,微调模型其他Block的参数?
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
现在我对p-tuning/modeling_chatglm.py中第850行代码起添加了如下代码: ` if self.pre_seq_len is not None: for param in self.parameters(): param.requires_grad = False self.prefix_tokens = torch.arange(self.pre_seq_len).long() self.prefix_encoder = PrefixEncoder(config) self.dropout = torch.nn.Dropout(0.1)
for k, v in self.prefix_encoder.named_parameters():
v.requires_grad = False
for k, v in self.layers[0].named_parameters():
v.requires_grad = True
` 在继续微调时会导致梯度爆炸,loss出现nan。(LR修改成了全量微调时的1e-4)
Expected Behavior
No response
Steps To Reproduce
在p-tuning/modeling_chatglm.py中第850行代码起添加如下代码:
for k, v in self.prefix_encoder.named_parameters(): v.requires_grad = False for k, v in self.layers[0].named_parameters(): v.requires_grad = True
Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response