AmazingJ
AmazingJ
官方的 DeBerta 代码会特别吃 CPU 内存,不适用预训练任务
有几个问题哈 1. 直接运行 sh 文件,从头初始化一个 Deberta 模型,那用的是什么 tokenizer,sh 文件里面也没用参数指定呀?词表用的是什么? 2. 如果我想 continuing training 的话, 需要怎么操作?是否修改.sh文件里面的 load_ckpt_path 参数即可?
@stefan-it But the paper reports using hundreds of GB of data. How did they do it?
font{ line-height: 1.6; } ul,ol{ padding-left: 20px; list-style-position: inside; } First, you need to linearize the AMR graph. You can use konstas's script or song's script, because the final performance...
Python has an [“anytree”] (https://pypi.org/project/anytree/2.1.4/) . You can try.
After deleting "@@ ", the BLEU value should not decline, but rise a lot. Are you sure you are doing the right BPE process? It is worth noting that not...
What I mean is that the source and target segment needs to do BPE during training, and the target segment does not need to do BPE during testing. BPE is...
yes. During the test, only the source side needs to do BPE, and then test BLEU after deleting @@.
On LDC2015E86 10000 On LDC2017T10 20000 train_file: cat train_source+train_target