lm-evaluation-harness truthfulqa_mc2 is Nan, while truthfulqa

trafficstars

I have finetued a model based on llama-2-hf, and run the evaluation with code and get truthfulqa_mc2 is Nan, while truthfulqa_mc1 is 1.00.

What does that means?

python main.py --model hf-causal-experimental --model_args pretrained=../mamba-gpt-7b-v2 --tasks anli_r1,anli_r2,anli_r3,arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,record,rte,truthfulqa_mc,wic,winogrande --device cuda:0

hf-causal-experimental (pretrained=../mamba-gpt-7b-v2), limit: None, provide_description: False, num_fewshot: 0, batch_size: None

Task	Version	Metric	Value		Stderr
anli_r1	0	acc	0.3340	±	0.0149
anli_r2	0	acc	0.3340	±	0.0149
anli_r3	0	acc	0.3350	±	0.0136
arc_challenge	0	acc	0.2270	±	0.0122
		acc_norm	0.2270	±	0.0122
arc_easy	0	acc	0.2508	±	0.0089
		acc_norm	0.2508	±	0.0089
boolq	1	acc	0.3783	±	0.0085
hellaswag	0	acc	0.2504	±	0.0043
		acc_norm	0.2504	±	0.0043
openbookqa	0	acc	0.2760	±	0.0200
		acc_norm	0.2760	±	0.0200
piqa	0	acc	0.4951	±	0.0117
		acc_norm	0.4951	±	0.0117
record	0	f1	0.1186	±	0.0032
		em	0.1151	±	0.0032
rte	0	acc	0.5271	±	0.0301
truthfulqa_mc	1	mc1	1.0000	±	0.0000
		mc2	NaN	±	NaN
wic	0	acc	0.5000	±	0.0198
winogrande	0	acc	0.4957	±	0.0141

Jul 31 '23 08:07 chi2liu

I have same issue！But I have done some operation to change or move the lora weight in my code. Have you solved it?

I have finetued a model based on llama-2-hf, and run the evaluation with code and get truthfulqa_mc2 is Nan, while truthfulqa_mc1 is 1.00.

What does that means?

python main.py --model hf-causal-experimental --model_args pretrained=../mamba-gpt-7b-v2 --tasks anli_r1,anli_r2,anli_r3,arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,record,rte,truthfulqa_mc,wic,winogrande --device cuda:0

hf-causal-experimental (pretrained=../mamba-gpt-7b-v2), limit: None, provide_description: False, num_fewshot: 0, batch_size: None

Task Version Metric Value Stderr anli_r1 0 acc 0.3340 ± 0.0149 anli_r2 0 acc 0.3340 ± 0.0149 anli_r3 0 acc 0.3350 ± 0.0136 arc_challenge 0 acc 0.2270 ± 0.0122 acc_norm 0.2270 ± 0.0122 arc_easy 0 acc 0.2508 ± 0.0089 acc_norm 0.2508 ± 0.0089 boolq 1 acc 0.3783 ± 0.0085 hellaswag 0 acc 0.2504 ± 0.0043 acc_norm 0.2504 ± 0.0043 openbookqa 0 acc 0.2760 ± 0.0200 acc_norm 0.2760 ± 0.0200 piqa 0 acc 0.4951 ± 0.0117 acc_norm 0.4951 ± 0.0117 record 0 f1 0.1186 ± 0.0032 em 0.1151 ± 0.0032 rte 0 acc 0.5271 ± 0.0301 truthfulqa_mc 1 mc1 1.0000 ± 0.0000 mc2 NaN ± NaN wic 0 acc 0.5000 ± 0.0198 winogrande 0 acc 0.4957 ± 0.0141