holmesgpt
holmesgpt copied to clipboard
ROB-2230 random key for all tool calls
Results of HolmesGPT evals
- ask_holmes: 33/36 test cases were successful, 2 regressions, 1 setup failures
| Test suite | Test case | Status |
|---|---|---|
| ask | 01_how_many_pods | :white_check_mark: |
| ask | 02_what_is_wrong_with_pod | :white_check_mark: |
| ask | 04_related_k8s_events | :white_check_mark: |
| ask | 05_image_version | :white_check_mark: |
| ask | 09_crashpod | :white_check_mark: |
| ask | 10_image_pull_backoff | :white_check_mark: |
| ask | 110_k8s_events_image_pull | :white_check_mark: |
| ask | 11_init_containers | :white_check_mark: |
| ask | 13a_pending_node_selector_basic | :white_check_mark: |
| ask | 14_pending_resources | :white_check_mark: |
| ask | 15_failed_readiness_probe | :white_check_mark: |
| ask | 17_oom_kill | :white_check_mark: |
| ask | 18_oom_kill_from_issues_history | :white_check_mark: |
| ask | 19_detect_missing_app_details | :white_check_mark: |
| ask | 20_long_log_file_search | :x: |
| ask | 24_misconfigured_pvc | :x: |
| ask | 24a_misconfigured_pvc_basic | :white_check_mark: |
| ask | 28_permissions_error | :construction: |
| ask | 39_failed_toolset | :white_check_mark: |
| ask | 41_setup_argo | :white_check_mark: |
| ask | 42_dns_issues_steps_new_tools | :white_check_mark: |
| ask | 43_current_datetime_from_prompt | :white_check_mark: |
| ask | 45_fetch_deployment_logs_simple | :white_check_mark: |
| ask | 51_logs_summarize_errors | :white_check_mark: |
| ask | 53_logs_find_term | :white_check_mark: |
| ask | 54_not_truncated_when_getting_pods | :white_check_mark: |
| ask | 59_label_based_counting | :white_check_mark: |
| ask | 60_count_less_than | :white_check_mark: |
| ask | 61_exact_match_counting | :white_check_mark: |
| ask | 63_fetch_error_logs_no_errors | :white_check_mark: |
| ask | 79_configmap_mount_issue | :white_check_mark: |
| ask | 83_secret_not_found | :white_check_mark: |
| ask | 86_configmap_like_but_secret | :white_check_mark: |
| ask | 93_calling_datadog[0] | :white_check_mark: |
| ask | 93_calling_datadog[1] | :white_check_mark: |
| ask | 93_calling_datadog[2] | :white_check_mark: |
Legend
- :white_check_mark: the test was successful
- :minus: the test was skipped
- :warning: the test failed but is known to be flaky or known to fail
- :construction: the test had a setup failure (not a code regression)
- :wrench: the test failed due to mock data issues (not a code regression)
- :no_entry_sign: the test was throttled by API rate limits/overload
- :x: the test failed and should be fixed before merging the PR