gorilla
gorilla copied to clipboard
Clarify Documentation About Running The Benchmark
I had trouble following the instructions for running the benchmark
- Why do we have to apply the credentials one file at a time? Don't all the files get concatenated by
eval_data_compilation.py
?
- The TEST_CATEGORY is not the same thing between
openfunctions_evaluation.py
andeval_checker/eval_runner.py
? Why are the test categories different?
- Should the
MODEL_NAME
be the same betweenopenfunctions_evaluation.py
andeval_checker/eval_runner.py
? or is there a reason you might want them to be different?
It would be helpful to more fully describe what each step is doing and why if possible. Also, another question is the OMDB API
is really flaky in my experience, I intermittently get 401 errors for no reason, even on the same request (sometimes 200, sometimes 401). But I couldn't figure out how to turn test cases that involved this API off, and it caused things to fail. Perhaps by understanding what each step is meant to do in more detail it will help me run the benchmark!
cc: @HuanzhiMao