goose
goose copied to clipboard
feat: refactor register eval
Changes
For Bench Users
- The CLI flag
-snow refers to--selectors, where a selector is- a colon delimited string of suite, sub-suite(s), and eval filename. ex
bench -s core:developer:web_scrape -s "core:memory, vibes"
- top-level suite-result reports will be for the lowest level of suite, ex.
- for selector
core:developer, - the results-report will be for
core:developer, and not justcore, - because here,
developeris the lowest level of suite-grouping.
- for selector
- if multiple selectors are supplied where one selector is a child of another, the more general selector will be chosen. ex
-s core -s core:developer- here, everything in
corewill be run.
- the
--listflag has been updated to return a list of every valid selector that can be passed to-s, and the number of evals they will run
For Eval Authors
Adding an Eval
- If a suitable suite doesn't yet exist at
eval_suites/create a rust module and any desired sub-modules for it and place eval file there - Within the eval file, be sure to end the implementation with a call to
register_evaluation!(MyNewEval); - it will now be selectable as
your_new_suite_name:eval_filename, or if nested deeper,your_new_suite_name:your_also_new_subsuite_name:eval_filename
Design
Ingestion/Pre-Processing
- each eval registers itself with the
register_evaluationmacro defined infactory.rs - this macro updates the
registrywhich is a map between the path to the eval, and the eval constructor - in the registry, the path to the eval is converted to being a "selector" by substituting all the path-separators with colons.
- once complete, the registry will be populated with all eval-paths (where components are colon-separated), and their respective constructors.
Lookup
- The CLI expects to be supplied with one or more selectors of varying granularity.
- since the only knowledge of where evals live, and to which suite they belong is in the registry keys (the paths to evals), these keys are matched against the user-supplied selectors by prefix matching the user-string against the registry key. its from here that the idea of a suite emerges, its not actually tracked in any other way.
- naturally, this also applies to the impl. of
--listand any related functionality, the suite hierarchies and their constituent evals are extracted from the registry keys