Yushi Bai comments

Results 102 comments of


                                            Yushi Bai

请问数据集中 avg length 是单词长度/字长度还是token个数？

由于不同模型的tokenizer不同，为了统一长度测度，我们汇报的avg length对于中/英数据集分别是是字数/单词数。

Submit results on t3bench

Thanks for your attention. Our paper is under review now so we are currently not open to result submission. Nevertheless, we encourage you to add these evaluation results to your...

Link Prediction

I guess there is a difference in the evaluation setting. In our experiment, we measure the prediction accuracy **only on the tail entity (t)** in the test triplets, following previous...

Link Prediction

Correct me if i'm wrong, but I think MetaQA mainly focuses on testing how well a QA system can take a natural-language (NL) form multi-hop query and ground the NL...

GPU Memory Increases Significantly

I will try to navigate the problem about GPU usage when activating iterative training. The hyperparameters on all 4 datasets are shown in the command line in README.md, you will...

GPU Memory Increases Significantly

BTW, you can also find the best hyperparameters in Table 8 of our paper (appendix).

Good question! SFT中算loss通常来讲都是样本内作token-level mean，样本间作sequence-level mean，也就是等式（2）的计算方式。如果不同样本间作token-level mean，则会使target token数量多的样本更受重视（相当于被upsample），从而引入不同样本间的不平衡。如果按照你说的"target token loss 总和 / target token 总数"的总loss计算方式，只需要将代码中对每个样本原本作mean得到的token-level loss替换为作sum得到的target token loss总和即可。

Yushi Bai

请问数据集中 avg length 是单词长度/字长度还是token个数？

Submit results on t3bench

Link Prediction

Link Prediction

GPU Memory Increases Significantly

GPU Memory Increases Significantly

关于Packing和直接Batch的loss区别？

关于Packing和直接Batch的loss区别？

关于Packing和直接Batch的loss区别？

运行环境