holmesgpt icon indicating copy to clipboard operation
holmesgpt copied to clipboard

ROB-2230 random key for all tool calls

Open nherment opened this issue 1 month ago • 1 comments

nherment avatar Nov 28 '25 06:11 nherment

Results of HolmesGPT evals

  • ask_holmes: 33/36 test cases were successful, 2 regressions, 1 setup failures
Test suite Test case Status
ask 01_how_many_pods :white_check_mark:
ask 02_what_is_wrong_with_pod :white_check_mark:
ask 04_related_k8s_events :white_check_mark:
ask 05_image_version :white_check_mark:
ask 09_crashpod :white_check_mark:
ask 10_image_pull_backoff :white_check_mark:
ask 110_k8s_events_image_pull :white_check_mark:
ask 11_init_containers :white_check_mark:
ask 13a_pending_node_selector_basic :white_check_mark:
ask 14_pending_resources :white_check_mark:
ask 15_failed_readiness_probe :white_check_mark:
ask 17_oom_kill :white_check_mark:
ask 18_oom_kill_from_issues_history :white_check_mark:
ask 19_detect_missing_app_details :white_check_mark:
ask 20_long_log_file_search :x:
ask 24_misconfigured_pvc :x:
ask 24a_misconfigured_pvc_basic :white_check_mark:
ask 28_permissions_error :construction:
ask 39_failed_toolset :white_check_mark:
ask 41_setup_argo :white_check_mark:
ask 42_dns_issues_steps_new_tools :white_check_mark:
ask 43_current_datetime_from_prompt :white_check_mark:
ask 45_fetch_deployment_logs_simple :white_check_mark:
ask 51_logs_summarize_errors :white_check_mark:
ask 53_logs_find_term :white_check_mark:
ask 54_not_truncated_when_getting_pods :white_check_mark:
ask 59_label_based_counting :white_check_mark:
ask 60_count_less_than :white_check_mark:
ask 61_exact_match_counting :white_check_mark:
ask 63_fetch_error_logs_no_errors :white_check_mark:
ask 79_configmap_mount_issue :white_check_mark:
ask 83_secret_not_found :white_check_mark:
ask 86_configmap_like_but_secret :white_check_mark:
ask 93_calling_datadog[0] :white_check_mark:
ask 93_calling_datadog[1] :white_check_mark:
ask 93_calling_datadog[2] :white_check_mark:

Legend

  • :white_check_mark: the test was successful
  • :minus: the test was skipped
  • :warning: the test failed but is known to be flaky or known to fail
  • :construction: the test had a setup failure (not a code regression)
  • :wrench: the test failed due to mock data issues (not a code regression)
  • :no_entry_sign: the test was throttled by API rate limits/overload
  • :x: the test failed and should be fixed before merging the PR

github-actions[bot] avatar Dec 07 '25 10:12 github-actions[bot]