gorilla icon indicating copy to clipboard operation
gorilla copied to clipboard

Clarify Documentation About Running The Benchmark

Open hamelsmu opened this issue 7 months ago • 1 comments

I had trouble following the instructions for running the benchmark

  1. Why do we have to apply the credentials one file at a time? Don't all the files get concatenated by eval_data_compilation.py?

image

  1. The TEST_CATEGORY is not the same thing between openfunctions_evaluation.py and eval_checker/eval_runner.py ? Why are the test categories different?

image

  1. Should the MODEL_NAME be the same between openfunctions_evaluation.py and eval_checker/eval_runner.py ? or is there a reason you might want them to be different?

It would be helpful to more fully describe what each step is doing and why if possible. Also, another question is the OMDB API is really flaky in my experience, I intermittently get 401 errors for no reason, even on the same request (sometimes 200, sometimes 401). But I couldn't figure out how to turn test cases that involved this API off, and it caused things to fail. Perhaps by understanding what each step is meant to do in more detail it will help me run the benchmark!

cc: @HuanzhiMao

hamelsmu avatar Jul 06 '24 15:07 hamelsmu