can-ai-code icon indicating copy to clipboard operation
can-ai-code copied to clipboard

Guide on how to evaluate models

Open kisimoff opened this issue 4 months ago • 1 comments

Im willing to test a few models and share the results. I've looked at the readme, but couldn't wrap my head around how to benchmark a model. Any help would be appriciated!

kisimoff avatar Apr 03 '24 09:04 kisimoff

The docs definitely need a rewrite my apologies here.

The general flow is:

  1. prepare.py
  2. interview*.py
  3. eval.py

In the dark days we had to deal with dozens of prompt formats, but these days prepare.py can be run with --chat hfmodel and it will sort it out.

Note that there are two interviews junior-v2 and senior, I usually only run senior on strong models that get >90% on junior.

the-crypt-keeper avatar Apr 07 '24 23:04 the-crypt-keeper