goose icon indicating copy to clipboard operation
goose copied to clipboard

work dirs

Open laanak08 opened this issue 10 months ago • 1 comments

Invocation

  • help: cargo run --bin goose -- bench --help
  • cargo run --bin goose -- bench to run the "core" suite of bencharks
  • cargo run --bin goose -- bench -s $suite_name1,$suite_name2,...,etc
  • cargo run --bin goose -- bench --repeat 3 to run the evals 3 times
  • cargo run --bin goose -- bench -i "some_dir,some_other_dir to have some_dir & some_other_dir copied into the relevant workdir that needs it.
  • add new benchmark-suites to crates/goose-bench/src/eval_suites

How Work-Dirs...work

  • the purpose of the work-dir is to have a place to read-write files, that can be referenced as the "current directory" from within the evaluation code
  • each invocation of goose bench will create if not exists, a dir for the provider under which will have
  • a date-time dir for the run, under which,
  • a dir per eval-suite, under which,
  • a dir for the eval-itself
  • multiple runs for the same provider will result in a tree like the following. Screenshot 2025-02-20 at 12 03 56 PM

Semantics [DO NOT SKIP READING]

  • there is a core suite of evaluations that runs by default if the --suites cli flag is not set
    • differently stated, any evaluation not included in core will not run
  • if --suites is supplied, only the items in that list will run, so if core isnt part of the list of suites passed to --suites, it will not run.

Individual Evals

  • example can be examined here: crates/goose-bench/src/eval_suites/core/example.rs
  • groups of related evals can be placed together in a rust module representing the suite crates/goose-bench/src/eval_suites/core
    • In this example core is the $suite_name
    • where each eval is in its own file at crates/goose-bench/src/eval_suites/core/$eval_name
    • register new evals to the suite-name.
      • ex. suite name core, which has one eval example so its registered as follows:
      • register_evaluation!("core", ExampleEval)

Limitations

  • [x] no namespacing until this PR is merged in.
    • until then, wherever its run, and whatever its allowed to do (via exts), it will, without isolating its work to a tmp env
    • [x] copy files needed for eval into eval work-dir
  • [ ] bug: building with --release affects which eval suites are run. To Be Debugged
  • [ ] summary/run-report/errors-report
  • [ ] tracing. maybe it works, maybe it doesnt, havent checked.
  • [ ] ~~does not handle configuring ollama. still necessary to manually config before running bench~~
  • [ ] ~~test multiple configs easily.~~
    • ~~currently runs tests for the agent/config thats active in the environment its run.~~
  • [ ] ~~parallelize at evals-level, or suite-level, or goose-bench~~ struck items are outside the scope of current bench-work.

laanak08 avatar Feb 20 '25 13:02 laanak08

PR Preview Action v1.6.0 :---: |

:rocket: View preview at
https://block.github.io/goose/pr-preview/pr-1307/

|
Built to branch gh-pages at 2025-02-28 21:37 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

github-actions[bot] avatar Feb 20 '25 16:02 github-actions[bot]