请问在运行p-tung/train.sh微调时是否可以冻结prefix_encoder层的全部参数，微调模型其他Block的参数？

Open huilong-chen opened this issue 2 years ago • 0 comments

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

现在我对p-tuning/modeling_chatglm.py中第850行代码起添加了如下代码： ` if self.pre_seq_len is not None: for param in self.parameters(): param.requires_grad = False self.prefix_tokens = torch.arange(self.pre_seq_len).long() self.prefix_encoder = PrefixEncoder(config) self.dropout = torch.nn.Dropout(0.1)

        for k, v in self.prefix_encoder.named_parameters():
            v.requires_grad = False
        for k, v in self.layers[0].named_parameters():
            v.requires_grad = True

` 在继续微调时会导致梯度爆炸，loss出现nan。（LR修改成了全量微调时的1e-4）

Expected Behavior

No response

Steps To Reproduce

在p-tuning/modeling_chatglm.py中第850行代码起添加如下代码： for k, v in self.prefix_encoder.named_parameters(): v.requires_grad = False for k, v in self.layers[0].named_parameters(): v.requires_grad = True

Environment

- OS: 
- Python:
- Transformers: 
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Jan 09 '24 07:01 huilong-chen