open_llama Open-LLaMA-3B results are much worse than reported in this repo

Open-LLaMA-3B results are much worse than reported in this repo

Open XinnuoXu opened this issue 1 year ago • 5 comments

Task Version Metric Value Stderr

anli_r1 0 acc 0.3330 ± 0.0149

anli_r2 0 acc 0.3320 ± 0.0149

anli_r3 0 acc 0.3367 ± 0.0136

arc_challenge 0 acc 0.2099 ± 0.0119

acc_norm 0.2705 ± 0.0130

arc_easy 0 acc 0.2542 ± 0.0089

acc_norm 0.2517 ± 0.0089

hellaswag 0 acc 0.2621 ± 0.0044

acc_norm 0.2741 ± 0.0045

openbookqa 0 acc 0.1800 ± 0.0172

acc_norm 0.2500 ± 0.0194

piqa 0 acc 0.5147 ± 0.0117

acc_norm 0.5011 ± 0.0117

record 0 f1 0.2017 ± 0.0040

em 0.1964 ± 0.0040

rte 0 acc 0.4946 ± 0.0301

truthfulqa_mc 1 mc1 0.2375 ± 0.0149

mc2 0.4767 ± 0.0169

wic 0 acc 0.5000 ± 0.0198

winogrande 0 acc 0.5099 ± 0.0140

Task	Version	Metric	Value		Stderr
anli_r1	0	acc	0.3330	±	0.0149
anli_r2	0	acc	0.3320	±	0.0149
anli_r3	0	acc	0.3367	±	0.0136
arc_challenge	0	acc	0.2099	±	0.0119
		acc_norm	0.2705	±	0.0130
arc_easy	0	acc	0.2542	±	0.0089
		acc_norm	0.2517	±	0.0089
hellaswag	0	acc	0.2621	±	0.0044
		acc_norm	0.2741	±	0.0045
openbookqa	0	acc	0.1800	±	0.0172
		acc_norm	0.2500	±	0.0194
piqa	0	acc	0.5147	±	0.0117
		acc_norm	0.5011	±	0.0117
record	0	f1	0.2017	±	0.0040
		em	0.1964	±	0.0040
rte	0	acc	0.4946	±	0.0301
truthfulqa_mc	1	mc1	0.2375	±	0.0149
		mc2	0.4767	±	0.0169
wic	0	acc	0.5000	±	0.0198
winogrande	0	acc	0.5099	±	0.0140

Jul 06 '23 10:07 XinnuoXu

It seems that the anli_* and truthfulqa_mc are similar. But the rest is -20% worse. I'm wondering the results reported in this repo for hellaswag and ARC_* are few-shot = 0 or not?

Jul 06 '23 10:07 XinnuoXu

Everything reported here is zero shot. Did you turn off the fast tokenizer when evaluating? There is a bug in the recent release of transformers library which causes the auto converted tokenizer to output different tokens than the original tokenizer. Therefore, when evaluating OpenLLaMA, you need to turn off the fast tokenizer.

Jul 07 '23 07:07 young-geng

Is that bug still there? I thought I read somewhere that it got fixed.

Jul 08 '23 07:07 buzzCraft

@buzzCraft It got fixed in the main branch of transformers but there hasn't been a release with that fix yet

Jul 08 '23 07:07 young-geng

@young-geng ok,since we are on the bleeding edge of the llm field, I usually go with the dev branch.

I also want to thank you and the team for the amazing work you have done. ❤️

Jul 08 '23 07:07 buzzCraft

open_llama open_llama copied to clipboard

Open-LLaMA-3B results are much worse than reported in this repo

open_llama
open_llama copied to clipboard