lm-evaluation-harness
lm-evaluation-harness copied to clipboard
Generator Error when evaluating GLUE and superGLUE
The error information shows that:
And the corresponding code is:
I guess the error was caused by the parameter <n_reordered_requests>, which is a generator, and that was assigned by:
So I think #1197 commitment changes it and results in this error. When I checkout/back to the previous version from two weeks ago, that's normal.
This should have been fixed in #1229. Are you on the latest commit?
This should have been fixed in #1229. Are you on the latest commit?
Yes, the version I used was commit #1238
Sorry, it seems to be ok in #1238, and I didn't realize the bug was fixed yesterday. Thank you!
hmm. Can you provide the full command? The previous bug occurred only when using batch "auto".
hmm. Can you provide the full command? The previous bug occurred only when using batch "auto".
Yeah,
lm_eval --model hf \
--model_args pretrained=gpt2-xl,trust_remote_code=true,dtype=bfloat16 \
--tasks glue,gsm8k,super-glue-lm-eval-v1 \
--batch_size auto \
--output_path ./eval_out/gpt2-xl \
--device cuda:0
btw, the following command triggered another error
lm_eval --model hf \
--model_args pretrained=Qwen/Qwen-14B-Chat,trust_remote_code=true,dtype=bfloat16 \
--tasks glue,gsm8k,super-glue-lm-eval-v1 \
--batch_size auto \
--output_path ./eval_out/qwen-14b \
--device cuda:0
The second one looks like a tokenizer bug. @haileyschoelkopf
Even worse, the first command cannot run properly
lm_eval --model hf \
--model_args pretrained=gpt2-xl,trust_remote_code=true,dtype=bfloat16 \
--tasks glue,gsm8k,super-glue-lm-eval-v1 \
--batch_size auto \
--output_path ./eval_out/gpt2-xl \
--device cuda:0
Even worse, the first command cannot run properly更糟糕的是,第一个命令无法正常运行
lm_eval --model hf \ --model_args pretrained=gpt2-xl,trust_remote_code=true,dtype=bfloat16 \ --tasks glue,gsm8k,super-glue-lm-eval-v1 \ --batch_size auto \ --output_path ./eval_out/gpt2-xl \ --device cuda:0
Furthermore, I located the error caused by super-glue-lm-eval-v1 evaluating. May there be any idea for solving it?
Taking a look right now!