Cao Yiwei

Results 4 comments of Cao Yiwei

分类层不使用attention的话,训练时候还是会attention到padding,推理时候padding和不padding结果有些微差别,不过对准确率影响不是特别大,模型也会学习到padding的embedding是没用的。

> 是的,代码里分类层没加mask是个bug,我修复一下,谢谢~ 好的好的

还有个问题,就是批处理推理的代码是跑不通的。我修改了一下,亲测可行。 ```python if inference: batch_size = hidden_states.shape[0] nb_class = int(self.num_class.item()) nb_classifiers = len(self.layer_classifiers) device = hidden_states.device # positions will keep track of the original position of each element in the...

又发现个新bug...在utils.py的init_adam_optimizer函数里,optimizer_parameters初始化有问题,no_decay的参数永远为0 改成: ```python optimizer_parameters = [ {"params": [p for param_name, p in model.named_parameters() if not any(name in param_name for name in no_decay)], "weight_decay_rate": 0.01}, {"params": [p for param_name, p in...