Heng Cai
Heng Cai
Sorry, for the time being, there is no plan to release the trained checkpoints. You can train the model by yourself, and the results can be reproduced in less than...
对,sentence level是整个句子被纠正为与ground truth一致才算正例,是句粒度的统计值
暂时还没写,你可将评测部分中生成的predict使用bert tokenizer解码即可完成预测
这个不会影响纠错的inference。
这个仓库的数据处理脚本是有些问题,可以使用这个仓库 [BertBasedCorrectionModels](https://github.com/gitabtion/BertBasedCorrectionModels) 处理数据后,再用本仓库训练
This error is caused by the encoding problem of the data, it fixed by my another project [BertBasedCorrectionModels](https://github.com/gitabtion/BertBasedCorrectionModels), you can use that project to process the data, and than copy...
i will fix this issue soon, thanks.
Thanks for your attention, you can repalce the Chinese Macbert with a model such as [bert-base-uncased](https://huggingface.co/bert-base-uncased).
我很好奇,评测数据中没有负样本怎么计算precision和F1,数据或评测脚本均已开源,您可自行评测,或参照pycorretor仓库的相关函数重新进行评测。[pycorrctor macbert4csc](https://github.com/shibing624/pycorrector/blob/master/examples/macbert/README.md) 关于指标的提升,我想额外说一句,在本实现之前,很少有实现保留预训练的MLMHead层权重去做FineTune的,而本仓库包括BBCM仓库在训练时,都保留了该层预训练参数,您要是感兴趣的话,可以做一个不加载该层预训练参数的消融实验,看是否能把指标降到您预期的范围内。