goose icon indicating copy to clipboard operation
goose copied to clipboard

feat: refactor register eval

Open laanak08 opened this issue 9 months ago • 0 comments

Changes

For Bench Users

  • The CLI flag -s now refers to --selectors, where a selector is
    • a colon delimited string of suite, sub-suite(s), and eval filename. ex
    • bench -s core:developer:web_scrape -s "core:memory, vibes"
  • top-level suite-result reports will be for the lowest level of suite, ex.
    • for selector core:developer,
    • the results-report will be for core:developer, and not just core,
    • because here, developer is the lowest level of suite-grouping.
  • if multiple selectors are supplied where one selector is a child of another, the more general selector will be chosen. ex
    • -s core -s core:developer
    • here, everything in core will be run.
  • the --list flag has been updated to return a list of every valid selector that can be passed to -s, and the number of evals they will run

For Eval Authors

Adding an Eval

  • If a suitable suite doesn't yet exist at eval_suites/ create a rust module and any desired sub-modules for it and place eval file there
  • Within the eval file, be sure to end the implementation with a call toregister_evaluation!(MyNewEval);
  • it will now be selectable as
  • your_new_suite_name:eval_filename, or if nested deeper,
  • your_new_suite_name:your_also_new_subsuite_name:eval_filename

Design

Ingestion/Pre-Processing

  • each eval registers itself with the register_evaluation macro defined in factory.rs
  • this macro updates the registry which is a map between the path to the eval, and the eval constructor
  • in the registry, the path to the eval is converted to being a "selector" by substituting all the path-separators with colons.
  • once complete, the registry will be populated with all eval-paths (where components are colon-separated), and their respective constructors.

Lookup

  • The CLI expects to be supplied with one or more selectors of varying granularity.
  • since the only knowledge of where evals live, and to which suite they belong is in the registry keys (the paths to evals), these keys are matched against the user-supplied selectors by prefix matching the user-string against the registry key. its from here that the idea of a suite emerges, its not actually tracked in any other way.
  • naturally, this also applies to the impl. of --list and any related functionality, the suite hierarchies and their constituent evals are extracted from the registry keys

laanak08 avatar Mar 16 '25 04:03 laanak08