LongBench icon indicating copy to clipboard operation
LongBench copied to clipboard

请问数据集中 avg length 是单词长度/字长度还是token个数?

Open deepindeed2022 opened this issue 10 months ago • 1 comments

deepindeed2022 avatar Apr 23 '24 01:04 deepindeed2022

由于不同模型的tokenizer不同,为了统一长度测度,我们汇报的avg length对于中/英数据集分别是是字数/单词数。

bys0318 avatar Apr 23 '24 14:04 bys0318