holmesgpt
holmesgpt copied to clipboard
ROB-2584: Eval for KRR tool
Results of HolmesGPT evals
- ask_holmes: 29/36 test cases were successful, 6 regressions, 1 setup failures
Legend
- :white_check_mark: the test was successful
- :minus: the test was skipped
- :warning: the test failed but is known to be flaky or known to fail
- :construction: the test had a setup failure (not a code regression)
- :wrench: the test failed due to mock data issues (not a code regression)
- :no_entry_sign: the test was throttled by API rate limits/overload
- :x: the test failed and should be fixed before merging the PR