lm-evaluation-harness
lm-evaluation-harness copied to clipboard
GSM8K Error result produced by gpt-3.5-turbo
Thank you for your outstanding work!
Recently, I have been trying to test the performance of GPT-3.5 turbo on gsm8k, but I have received poor results, as shown in the following figure:
This is the command I am using, and I would like to ask, what is the reason for this?
lm_eval --model openai-chat-completions --model_args model="gpt-3.5-turbo" --tasks gsm8k
Looking forward to your help!