Weijie Liu
Weijie Liu
> 应该添加labels = labels = ['T', 'F'] 多谢指出,我已修复这个问题
如果标签过多,可以考虑只选概率最高的前top N个概率计算uncertainty。比如N取5。
We have updated the new server for downloading models.
--batch_size 32
Hello! Could you describe this issue in more detail? E.g, how you find that the attention operation is missed? In our ``run_fastbert.py``, we use the following code to obtain the...
Thank you for your testing! I will analyze it further, and show my results as soon as possible.
After testing, we found that ``thop.profile`` does not calculate FLOPs for ``torch.matmul()`` operation. So, the FLOPs we obtained miss the ``torch.matmul`` parts. References: https://discuss.pytorch.org/t/get-the-matmul-operations-in-a-net/61058 This is a mistake in our...
目前不支持batch预测
> line 234: > > ```python > self._self_distillation( > sentences_train, batch_size, learning_rate, epochs_num, > warmup, report_steps, model_saving_pathm, sentences_dev, > labels_dev, dev_speed, verbose > ) > ``` > > `model_saving_pathm`应该是`model_saving_path`吧 多谢您发现这个问题。...
> 你好,我在复现您的实验(没有进行任何修改)的时候在主干网络的训练时准确率是逐渐提高的,在蒸馏阶段验证集和测试集的acc每一个epoch都和主干网络的最后一个epoch相同,请问是我哪里出错了吗? 你蒸馏时的speed设为多少,这么看上去像是speed=0.0, 导致所有样本都走到主干的最后一层。