bert_distill issues

distill.py结果问题

2

首先感谢分享代码，我看distill.py有个疑问，最后输出的准确率是dev集上的结果，而默认teach_on_dev = True，这样相当于用dev集合在训练，这会导致测试效果虚高吧？

letiantony

utils.py 86行bug

" np.random.rand() > p_mask" 而不是 " np.random.rand() < p_mask"

timberswift

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

1

您好，能请教一个问题吗？我在运行python ptbert.py的时候报了上面的错，显示错误在pooled_output = self.dropout(pooled_output)这一行，打印出pooled_output是'pooler_output'这个东西，是个str不是tensor，这就很奇怪了，_, pooled_output = self.bert(input_ids, None, input_mask)，为什么bert出来的pooled_output就是'pooler_output'呢？我不知道是哪里错了，还望能指点下吗？非常感谢大佬！

luohm111

关于损失函数

论文中提高的是使用logits，但是提交的代码是softmax后的结果，请问这里是由什么原因吗？

yubinml2019

code疑问：BertModel的传参和distill中teacher的预测输入

https://github.com/qiangsiwei/bert_distill/blob/ceed9c9455d70dde24990014945a382e290d61ff/ptbert.py#L103 这里的传参，input_mask是不是传错位置了，您这样相当于attention_mask是None， token_type_ids是input_mask。 https://github.com/qiangsiwei/bert_distill/blob/master/distill.py#L22 这里预测的时候为什么不加[CLS]和[SEP]了？感谢大佬开源，希望大佬能解答下我的疑惑。

Daishijun