holmesgpt icon indicating copy to clipboard operation
holmesgpt copied to clipboard

ROB-1267: Unified Holmes logging

Open nherment opened this issue 7 months ago • 2 comments

Summary by CodeRabbit

  • New Features

    • Introduced a unified Kubernetes log-fetching tool supporting multi-container pods, substring filtering, and timestamp-based retrieval.
    • Added detailed prompt templates and instructions to guide effective log investigation using the new log-fetching tool.
    • Included new test fixtures and comprehensive unit tests validating Kubernetes log retrieval, timestamp filtering, and prompt rendering.
    • Added support for Coralogix, Grafana Loki, and OpenSearch log integrations with standardized pod log fetching interfaces.
  • Bug Fixes

    • Enhanced error handling and fallback mechanisms when pods are missing or logs are unavailable.
  • Refactor

    • Replaced legacy Kubernetes log tool names with the new unified log-fetching tool across prompts and tests.
    • Streamlined log toolset interfaces and internal logic for improved clarity and maintainability.
    • Integrated tracing spans into mocking frameworks and evaluation utilities to enhance test observability.
    • Refactored Coralogix logs toolset to use a shared logging API base and standardized parameter handling.
    • Simplified Coralogix log formatting by removing timestamp prefixes and indentation.
    • Refactored Grafana Loki and OpenSearch toolsets to unify pod log fetching under a common base class and typed parameters.
    • Consolidated OpenSearch configuration and query building with simplified log formatting.
  • Tests

    • Added extensive unit tests for Kubernetes log fetching and timestamp filtering.
    • Updated and removed obsolete test fixtures to align with new toolset behavior.
    • Enhanced test infrastructure with tracing spans and simplified evaluation span management.
    • Improved Coralogix integration tests with environment validation and expanded log fetching scenarios.
    • Added new integration tests for Grafana Loki and OpenSearch log fetching.
    • Added prompt rendering tests for log-fetching toolsets.
  • Documentation

    • Updated prompt instructions and test case configurations to reflect new log-fetching workflows and evaluation criteria.
  • Chores

    • Removed deprecated log toolset configurations and related test data.
    • Consolidated environment variable handling in CI workflows.
    • Refined evaluation logic and test metadata for improved clarity and traceability.

nherment avatar May 09 '25 12:05 nherment

Summary by CodeRabbit

  • New Features

    • Introduced a unified Kubernetes log-fetching tool supporting multi-container pods, substring filtering, and timestamp-based retrieval.
    • Added detailed prompt templates and instructions to guide effective log investigation using the new log-fetching tool.
    • Included new test fixtures and comprehensive unit tests validating Kubernetes log retrieval, timestamp filtering, and prompt rendering.
    • Added support for Coralogix, Grafana Loki, and OpenSearch log integrations with standardized pod log fetching interfaces.
  • Bug Fixes

    • Enhanced error handling and fallback mechanisms when pods are missing or logs are unavailable.
  • Refactor

    • Replaced legacy Kubernetes log tool names with the new unified log-fetching tool across prompts and tests.
    • Streamlined log toolset interfaces and internal logic for improved clarity and maintainability.
    • Integrated tracing spans into mocking frameworks and evaluation utilities to enhance test observability.
    • Refactored Coralogix logs toolset to use a shared logging API base and standardized parameter handling.
    • Simplified Coralogix log formatting by removing timestamp prefixes and indentation.
    • Refactored Grafana Loki and OpenSearch toolsets to unify pod log fetching under a common base class and typed parameters.
    • Consolidated OpenSearch configuration and query building with simplified log formatting.
  • Tests

    • Added extensive unit tests for Kubernetes log fetching and timestamp filtering.
    • Updated and removed obsolete test fixtures to align with new toolset behavior.
    • Enhanced test infrastructure with tracing spans and simplified evaluation span management.
    • Improved Coralogix integration tests with environment validation and expanded log fetching scenarios.
    • Added new integration tests for Grafana Loki and OpenSearch log fetching.
    • Added prompt rendering tests for log-fetching toolsets.
  • Documentation

    • Updated prompt instructions and test case configurations to reflect new log-fetching workflows and evaluation criteria.
  • Chores

    • Removed deprecated log toolset configurations and related test data.
    • Consolidated environment variable handling in CI workflows.
    • Refined evaluation logic and test metadata for improved clarity and traceability.

Summary by CodeRabbit

  • New Features

    • Introduced a unified Kubernetes log-fetching tool supporting multi-container pods, substring filtering, and timestamp-based retrieval.
    • Added detailed prompt templates and instructions to guide effective log investigation using the new log-fetching tool.
    • Included new test fixtures and comprehensive unit tests validating Kubernetes log retrieval, timestamp filtering, and prompt rendering.
    • Added support for Coralogix, Grafana Loki, and OpenSearch log integrations with standardized pod log fetching interfaces.
  • Bug Fixes

    • Enhanced error handling and fallback mechanisms when pods are missing or logs are unavailable.
  • Refactor

    • Replaced legacy Kubernetes log tool names with the new unified log-fetching tool across prompts and tests.
    • Streamlined log toolset interfaces and internal logic for improved clarity and maintainability.
    • Integrated tracing spans into mocking frameworks and evaluation utilities to enhance test observability.
    • Refactored Coralogix logs toolset to use a shared logging API base and standardized parameter handling.
    • Simplified Coralogix log formatting by removing timestamp prefixes and indentation.
    • Refactored Grafana Loki and OpenSearch toolsets to unify pod log fetching under a common base class and typed parameters.
    • Consolidated OpenSearch configuration and query building with simplified log formatting.
  • Tests

    • Added extensive unit tests for Kubernetes log fetching and timestamp filtering.
    • Updated and removed obsolete test fixtures to align with new toolset behavior.
    • Enhanced test infrastructure with tracing spans and simplified evaluation span management.
    • Improved Coralogix integration tests with environment validation and expanded log fetching scenarios.
    • Added new integration tests for Grafana Loki and OpenSearch log fetching.
    • Added prompt rendering tests for log-fetching toolsets.
  • Documentation

    • Updated prompt instructions and test case configurations to reflect new log-fetching workflows and evaluation criteria.
  • Chores

    • Removed deprecated log toolset configurations and related test data.
    • Consolidated environment variable handling in CI workflows.
    • Refined evaluation logic and test metadata for improved clarity and traceability.

Walkthrough

This update introduces a unified, strongly typed logging API for Kubernetes pod log retrieval, refactoring major logging toolsets (Kubernetes, Coralogix, Grafana Loki, OpenSearch) to use a standard fetch_pod_logs interface and parameter model. It restructures toolset management, moves ToolExecutor to a new module, updates prompt templates, and revises test fixtures and evaluation logic accordingly.

Changes

File(s) / Path(s) Change Summary
holmes/plugins/toolsets/logging_utils/logging_api.py (new) Adds a unified, typed logging API with standard config, parameter, and toolset base classes for pod log retrieval.
holmes/plugins/toolsets/kubernetes_logs.py (new) Implements a new Kubernetes logs toolset using the unified API, with structured log parsing/filtering and error handling.
holmes/plugins/toolsets/coralogix/api.py, toolset_coralogix_logs.py, utils.py Refactors Coralogix toolset to use typed parameters and unified log-fetching interface; consolidates config and log processing utilities.
holmes/plugins/toolsets/grafana/toolset_grafana_loki.py, grafana_api.py, common.py, loki_api.py, base_grafana_toolset.py Refactors Grafana Loki toolset to use the unified logging API, simplifies health check logic, and updates log formatting.
holmes/plugins/toolsets/opensearch/opensearch_logs.py, opensearch_utils.py Refactors OpenSearch logs toolset to a single unified class using typed config and standardized query construction.
holmes/plugins/toolsets/utils.py Adds timestamp conversion utility for log filtering; updates default time span calculation logic.
holmes/plugins/toolsets/init.py, holmes/common/env_vars.py Adds legacy flag for Kubernetes logs toolset; updates toolset loading logic to respect the flag.
holmes/core/tools.py, holmes/core/tools_utils/tool_executor.py (new), holmes/core/tools_utils/toolset_utils.py (new), holmes/config.py, holmes/core/tool_calling_llm.py Removes ToolExecutor from tools.py, moves it to a new module, and adds a utility to filter logging toolsets. Updates imports accordingly.
holmes/core/conversations.py Ensures conversation history is copied before mutation in chat message building.
holmes/plugins/prompts/_default_log_prompt.jinja2 (new), _fetch_logs.jinja2, _general_instructions.jinja2 Adds and updates prompt templates and investigation instructions for the new logging API and tool usage.
holmes/plugins/toolsets/robusta/robusta_instructions.jinja2 Adds instructions for investigating issues by finding IDs.
examples/custom_llm.py, .github/workflows/llm-evaluation.yaml Updates imports for ToolExecutor; renames and simplifies workflow job.
tests/llm/utils/mock_toolset.py, mock_utils.py, classifiers.py, braintrust.py Refactors mock tool wrappers for span tracing, updates test case loading for conversation history, and enhances evaluation tracing and Braintrust integration.
tests/llm/test_ask_holmes.py, test_investigate.py, test_mocks.py Updates test functions for explicit span management and parent span propagation; updates ToolExecutor import.
tests/llm/fixtures/** (multiple) Replaces legacy log tool invocations with fetch_pod_logs, adds/updates fixtures for new logging API, updates test cases and expected outputs, and removes legacy or redundant files.
docs/evals-writing.md Adds documentation on evaluation tagging.

Changes Table (Condensed)

Area / Files Change Summary
Logging API & Toolsets: holmes/plugins/toolsets/logging_utils/logging_api.py, kubernetes_logs.py, coralogix/*, grafana/*, opensearch/* Introduces unified logging API, refactors all major log toolsets to use typed parameters and a consistent interface.
Toolset Management: holmes/core/tools.py, tools_utils/tool_executor.py, tools_utils/toolset_utils.py, holmes/config.py, holmes/core/tool_calling_llm.py Removes and relocates ToolExecutor, adds utility for filtering default logging toolsets, updates related imports.
Prompts & Instructions: holmes/plugins/prompts/_default_log_prompt.jinja2, _fetch_logs.jinja2, _general_instructions.jinja2, robusta_instructions.jinja2 Adds/updates prompt templates and investigation instructions for the new logging API and tool usage.
Test Infra & Mocks: tests/llm/utils/mock_toolset.py, mock_utils.py, classifiers.py, braintrust.py, test_ask_holmes.py, test_investigate.py, test_mocks.py Refactors for span-based tracing, conversation history support, and Braintrust integration in test evaluation.
Test Fixtures: tests/llm/fixtures/** Updates, adds, or deletes fixtures to use fetch_pod_logs and new logging API, revises test cases and expected outputs.
Miscellaneous: .github/workflows/llm-evaluation.yaml, docs/evals-writing.md, examples/custom_llm.py, holmes/core/conversations.py Workflow/job name update, documentation on tagging, import fixes, and defensive copy for conversation history.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LLM
    participant ToolExecutor
    participant Toolset (K8s/Coralogix/Loki/OpenSearch)
    participant LoggingBackend

    User->>LLM: User prompt (e.g., "Show logs for pod X")
    LLM->>ToolExecutor: invoke("fetch_pod_logs", params)
    ToolExecutor->>Toolset: fetch_pod_logs(params)
    Toolset->>LoggingBackend: Query logs (with typed params)
    LoggingBackend-->>Toolset: Log entries (structured)
    Toolset-->>ToolExecutor: StructuredToolResult (logs, status)
    ToolExecutor-->>LLM: StructuredToolResult
    LLM-->>User: Answer (with log excerpts/analysis)

Possibly related PRs

  • robusta-dev/holmesgpt#440: Refactors the Kubernetes logs toolset to add structured log entries, filtering, and formatting, directly relating to the new unified Kubernetes logs toolset introduced here.
  • robusta-dev/holmesgpt#421: Refactors the OpenSearch logs toolset to consolidate functionality into a single class using the unified logging API, which matches the OpenSearch toolset refactor in this PR.
  • robusta-dev/holmesgpt#429: Refactors the Coralogix logs toolset to use a unified logging API and typed parameters, directly related to the Coralogix refactor in this PR.

Suggested labels

enhancement

Suggested reviewers

  • arikalon1
  • moshemorad
✨ Finishing Touches
  • [ ] 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar May 14 '25 08:05 coderabbitai[bot]

Results of HolmesGPT evals

Test suite Test case Status
ask_holmes 01_how_many_pods :warning:
ask_holmes 02_what_is_wrong_with_pod :white_check_mark:
ask_holmes 03_what_is_the_command_to_port_forward :white_check_mark:
ask_holmes 04_related_k8s_events :white_check_mark:
ask_holmes 05_image_version :white_check_mark:
ask_holmes 06_explain_issue :white_check_mark:
ask_holmes 07_high_latency :white_check_mark:
ask_holmes 08_sock_shop_frontend :x:
ask_holmes 09_crashpod :white_check_mark:
ask_holmes 10_image_pull_backoff :white_check_mark:
ask_holmes 11_init_containers :white_check_mark:
ask_holmes 12_job_crashing :white_check_mark:
ask_holmes 13_pending_node_selector :white_check_mark:
ask_holmes 14_pending_resources :white_check_mark:
ask_holmes 15_failed_readiness_probe :white_check_mark:
ask_holmes 16_failed_no_toolset_found :white_check_mark:
ask_holmes 17_oom_kill :white_check_mark:
ask_holmes 18_crash_looping_v2 :white_check_mark:
ask_holmes 19_detect_missing_app_details :white_check_mark:
ask_holmes 20_long_log_file_search :white_check_mark:
ask_holmes 21_job_fail_curl_no_svc_account :warning:
ask_holmes 22_high_latency_dbi_down :x:
ask_holmes 23_app_error_in_current_logs :white_check_mark:
ask_holmes 24_misconfigured_pvc :white_check_mark:
ask_holmes 25_misconfigured_ingress_class :white_check_mark:
ask_holmes 26_multi_container_logs :warning:
ask_holmes 27_permissions_error_no_helm_tools :white_check_mark:
ask_holmes 28_permissions_error_helm_tools_enabled :white_check_mark:
ask_holmes 29_events_from_alert_manager :white_check_mark:
ask_holmes 30_basic_promql_graph_cluster_memory :white_check_mark:
ask_holmes 31_basic_promql_graph_pod_memory :white_check_mark:
ask_holmes 32_basic_promql_graph_pod_cpu :white_check_mark:
ask_holmes 33_http_latency_graph :white_check_mark:
ask_holmes 34_memory_graph :white_check_mark:
ask_holmes 35_tempo :white_check_mark:
ask_holmes 36_argocd_find_resource :white_check_mark:
ask_holmes 37_argocd_wrong_namespace :warning:
ask_holmes 38_rabbitmq_split_head :white_check_mark:
ask_holmes 39_failed_toolset :white_check_mark:
ask_holmes 40_disabled_toolset :white_check_mark:
ask_holmes 41_setup_argo :white_check_mark:
ask_holmes 42_dns_issues_result_all_tools :warning:
ask_holmes 42_dns_issues_result_new_tools :warning:
ask_holmes 42_dns_issues_result_old_tools :warning:
ask_holmes 42_dns_issues_steps_new_all_tools :warning:
ask_holmes 42_dns_issues_steps_new_tools :warning:
ask_holmes 42_dns_issues_steps_old_tools :warning:
ask_holmes 43_current_datetime_from_prompt :white_check_mark:
ask_holmes 43_slack_deployment_logs :white_check_mark:
ask_holmes 44_slack_statefulset_logs :white_check_mark:
ask_holmes 45_fetch_deployment_logs_simple :white_check_mark:
ask_holmes 46_job_crashing_no_longer_exists :x:
ask_holmes 47_truncated_logs_context_window :x:
ask_holmes 48_logs_since_thursday :x:
ask_holmes 49_logs_since_last_week :white_check_mark:
ask_holmes 50_logs_since_specific_date :x:
ask_holmes 51_logs_summarize_errors :white_check_mark:
ask_holmes 52_logs_login_issues :x:
ask_holmes 53_logs_find_term :white_check_mark:
investigate 01_oom_kill :white_check_mark:
investigate 02_crashloop_backoff :white_check_mark:
investigate 03_cpu_throttling :white_check_mark:
investigate 04_image_pull_backoff :white_check_mark:
investigate 06_job_failure :white_check_mark:
investigate 07_job_syntax_error :white_check_mark:
investigate 08_memory_pressure :white_check_mark:
investigate 09_high_latency :white_check_mark:
investigate 10_KubeDeploymentReplicasMismatch :white_check_mark:
investigate 11_KubePodCrashLooping :white_check_mark:
investigate 12_KubePodNotReady :white_check_mark:
investigate 13_Watchdog :white_check_mark:
investigate 14_tempo :white_check_mark:

Legend

  • :white_check_mark: the test was successful
  • :warning: the test failed but is known to be flakky or known to fail
  • :x: the test failed and should be fixed before merging the PR

github-actions[bot] avatar Jun 24 '25 12:06 github-actions[bot]