lm-evaluation-harness The response is too short to extract answer on GPQA. What should I set to extend it?

The response is too short to extract answer on GPQA. What should I set to extend it?

Open URRealHero opened this issue 7 months ago • 1 comments

lm_eval --model local-chat-completions --tasks gpqa_main_cot_zeroshot --model_args model=Qwen/Qwen2-72B-Instruct,base_url=https://api.together.xyz/v1 --output_path ./gpqa/result/Qwen2 --use_cache ./gpqa/cache/Qwen2 --log_samples --limit 10 --gen_kwargs temperature=0.7,max_tokens=8192 Using this command, The Qwen2's result just end sooo weirdly like the image below

To be specific, only 256 tokens are generated. I'm wondering why this happens, is there any problem with max_tokens?

Jul 08 '24 16:07 URRealHero

lm-evaluation-harness lm-evaluation-harness copied to clipboard

The response is too short to extract answer on GPQA. What should I set to extend it?

lm-evaluation-harness
lm-evaluation-harness copied to clipboard