Xiaoqing Zhou

Results 10 comments of Xiaoqing Zhou

> Actually the coverage mechanism isn't implemented for transformer decoders. Coverage comes from See 2018 which is based on RNNs instead (LSTM actually), therefore a single attention head. > >...

@Sengxian 1. Is it convenient to introduce the basic training settings? For example, the number of model parallel `mp` and `zero stage ` in training, the minimum number of nodes...

@Sengxian hi, has this plan delayed? > maybe later this month.

@peregilk Do you try it?

要是注明文章用的数据集就更好了,这个领域有不少数据集,不同论文用的数据集都不一样

@Remorax do you have solved this problem ?

you can try this `--eval-bleu --eval-bleu-args --eval-bleu-detok --eval-bleu-remove-bpe`

> > 噢 可以试试 > > 1、加载我发布的再训练模型进行微调 > > 2、微调参数 lr=2e-5、batch=8、epoch=2 > > 3、max_answer_length=384、--version_2_with_negative > > 我加载的luhua/chinese_pretrain_mrc_macbert_large和luhua/chinese_pretrain_mrc_roberta_wwm_ext_large模型权重,微调dureader2021比赛数据集,超参数和train_bert.sh脚本基本一致,就batch不一样,lr=2e-5、batch=4、epoch=2、max_answer_length=384、--version_2_with_negative,源代码也没修改,在验证集130条数据上,f1的变化[52.3077, 48.3666, 43.7441, 47.1442, 48.6656, 49.2478, 46.6051, 47.4777, 48.425, 47.5991],随着迭代,验证集跑了10次f1,但基本在40+,但是没有找到是什么原因,请问您有没有什么解决的思路提供我参考一下?感觉batch=4也不应该和您的结果有这么大的差距才对啊 @kangyishuai 有试过在CMRC数据集上微调作者发布的模型吗?效果像作者说的能进一步提升吗?

@GryffindorLi @chris-aeviator I think it is possible to support multiple categories. Initially, I understand that PET can be used on multiple tokens, so it should be able to be adapted...

This tool is great, and it will be more friendly to Chinese tasks if it can support more Chinese augmentation types.