Yushi Bai

Results 102 comments of Yushi Bai

由于不同模型的tokenizer不同,为了统一长度测度,我们汇报的avg length对于中/英数据集分别是是字数/单词数。

Thanks for your attention. Our paper is under review now so we are currently not open to result submission. Nevertheless, we encourage you to add these evaluation results to your...

I guess there is a difference in the evaluation setting. In our experiment, we measure the prediction accuracy **only on the tail entity (t)** in the test triplets, following previous...

Correct me if i'm wrong, but I think MetaQA mainly focuses on testing how well a QA system can take a natural-language (NL) form multi-hop query and ground the NL...

I will try to navigate the problem about GPU usage when activating iterative training. The hyperparameters on all 4 datasets are shown in the command line in README.md, you will...

BTW, you can also find the best hyperparameters in Table 8 of our paper (appendix).

Good question! SFT中算loss通常来讲都是样本内作token-level mean,样本间作sequence-level mean,也就是等式(2)的计算方式。如果不同样本间作token-level mean,则会使target token数量多的样本更受重视(相当于被upsample),从而引入不同样本间的不平衡。如果按照你说的"target token loss 总和 / target token 总数"的总loss计算方式,只需要将代码中对每个样本原本作mean得到的token-level loss替换为作sum得到的target token loss总和即可。

在packing训练使用loss weighting时`self.pack_loss`会被置为`True`,请参考`if self.pack_loss:`下的代码,我们首先对一个样本内每个target token上的loss乘以weight并求sum,然后多gpu上的不同样本的loss在transformer.Trainer中被作mean。

Seems like this PR also aims to mitigate the gradient accumulation issue in `transformers`: https://github.com/huggingface/transformers/pull/34191

您好,很抱歉我现在找不到当时跑的环境了,除了基础的torch,numpy这些,再装个transformers库就可以了,比较新的版本应该都可以跑。您如果在环境上有遇到具体问题欢迎追问!