lm-evaluation-harness GSM8K Error result produced by gpt-3.5-turbo

GSM8K Error result produced by gpt-3.5-turbo

Open APiaoG opened this issue 10 months ago • 0 comments

Thank you for your outstanding work! Recently, I have been trying to test the performance of GPT-3.5 turbo on gsm8k, but I have received poor results, as shown in the following figure:

This is the command I am using, and I would like to ask, what is the reason for this? lm_eval --model openai-chat-completions --model_args model="gpt-3.5-turbo" --tasks gsm8k

Looking forward to your help!

Apr 14 '24 18:04 APiaoG

lm-evaluation-harness lm-evaluation-harness copied to clipboard

GSM8K Error result produced by gpt-3.5-turbo

lm-evaluation-harness
lm-evaluation-harness copied to clipboard