simple-evals
simple-evals copied to clipboard

Published 20 hours ago •

Reame
Issues

Run benchmarks for old GPT-4 models (GPT-4-0314 and GPT-4-0613) and all GPT-3.5-turbo models

Open mikita-apollo opened this issue 9 months ago • 0 comments

Zero-shot scores for those models are not easily googleable — so this would be very useful for looking at the improvement trend over time!

May 14 '24 16:05 mikita-apollo