ROB-1267: Unified Holmes logging
Summary by CodeRabbit
-
New Features
- Introduced a unified Kubernetes log-fetching tool supporting multi-container pods, substring filtering, and timestamp-based retrieval.
- Added detailed prompt templates and instructions to guide effective log investigation using the new log-fetching tool.
- Included new test fixtures and comprehensive unit tests validating Kubernetes log retrieval, timestamp filtering, and prompt rendering.
- Added support for Coralogix, Grafana Loki, and OpenSearch log integrations with standardized pod log fetching interfaces.
-
Bug Fixes
- Enhanced error handling and fallback mechanisms when pods are missing or logs are unavailable.
-
Refactor
- Replaced legacy Kubernetes log tool names with the new unified log-fetching tool across prompts and tests.
- Streamlined log toolset interfaces and internal logic for improved clarity and maintainability.
- Integrated tracing spans into mocking frameworks and evaluation utilities to enhance test observability.
- Refactored Coralogix logs toolset to use a shared logging API base and standardized parameter handling.
- Simplified Coralogix log formatting by removing timestamp prefixes and indentation.
- Refactored Grafana Loki and OpenSearch toolsets to unify pod log fetching under a common base class and typed parameters.
- Consolidated OpenSearch configuration and query building with simplified log formatting.
-
Tests
- Added extensive unit tests for Kubernetes log fetching and timestamp filtering.
- Updated and removed obsolete test fixtures to align with new toolset behavior.
- Enhanced test infrastructure with tracing spans and simplified evaluation span management.
- Improved Coralogix integration tests with environment validation and expanded log fetching scenarios.
- Added new integration tests for Grafana Loki and OpenSearch log fetching.
- Added prompt rendering tests for log-fetching toolsets.
-
Documentation
- Updated prompt instructions and test case configurations to reflect new log-fetching workflows and evaluation criteria.
-
Chores
- Removed deprecated log toolset configurations and related test data.
- Consolidated environment variable handling in CI workflows.
- Refined evaluation logic and test metadata for improved clarity and traceability.
Summary by CodeRabbit
-
New Features
- Introduced a unified Kubernetes log-fetching tool supporting multi-container pods, substring filtering, and timestamp-based retrieval.
- Added detailed prompt templates and instructions to guide effective log investigation using the new log-fetching tool.
- Included new test fixtures and comprehensive unit tests validating Kubernetes log retrieval, timestamp filtering, and prompt rendering.
- Added support for Coralogix, Grafana Loki, and OpenSearch log integrations with standardized pod log fetching interfaces.
-
Bug Fixes
- Enhanced error handling and fallback mechanisms when pods are missing or logs are unavailable.
-
Refactor
- Replaced legacy Kubernetes log tool names with the new unified log-fetching tool across prompts and tests.
- Streamlined log toolset interfaces and internal logic for improved clarity and maintainability.
- Integrated tracing spans into mocking frameworks and evaluation utilities to enhance test observability.
- Refactored Coralogix logs toolset to use a shared logging API base and standardized parameter handling.
- Simplified Coralogix log formatting by removing timestamp prefixes and indentation.
- Refactored Grafana Loki and OpenSearch toolsets to unify pod log fetching under a common base class and typed parameters.
- Consolidated OpenSearch configuration and query building with simplified log formatting.
-
Tests
- Added extensive unit tests for Kubernetes log fetching and timestamp filtering.
- Updated and removed obsolete test fixtures to align with new toolset behavior.
- Enhanced test infrastructure with tracing spans and simplified evaluation span management.
- Improved Coralogix integration tests with environment validation and expanded log fetching scenarios.
- Added new integration tests for Grafana Loki and OpenSearch log fetching.
- Added prompt rendering tests for log-fetching toolsets.
-
Documentation
- Updated prompt instructions and test case configurations to reflect new log-fetching workflows and evaluation criteria.
-
Chores
- Removed deprecated log toolset configurations and related test data.
- Consolidated environment variable handling in CI workflows.
- Refined evaluation logic and test metadata for improved clarity and traceability.
Summary by CodeRabbit
-
New Features
- Introduced a unified Kubernetes log-fetching tool supporting multi-container pods, substring filtering, and timestamp-based retrieval.
- Added detailed prompt templates and instructions to guide effective log investigation using the new log-fetching tool.
- Included new test fixtures and comprehensive unit tests validating Kubernetes log retrieval, timestamp filtering, and prompt rendering.
- Added support for Coralogix, Grafana Loki, and OpenSearch log integrations with standardized pod log fetching interfaces.
-
Bug Fixes
- Enhanced error handling and fallback mechanisms when pods are missing or logs are unavailable.
-
Refactor
- Replaced legacy Kubernetes log tool names with the new unified log-fetching tool across prompts and tests.
- Streamlined log toolset interfaces and internal logic for improved clarity and maintainability.
- Integrated tracing spans into mocking frameworks and evaluation utilities to enhance test observability.
- Refactored Coralogix logs toolset to use a shared logging API base and standardized parameter handling.
- Simplified Coralogix log formatting by removing timestamp prefixes and indentation.
- Refactored Grafana Loki and OpenSearch toolsets to unify pod log fetching under a common base class and typed parameters.
- Consolidated OpenSearch configuration and query building with simplified log formatting.
-
Tests
- Added extensive unit tests for Kubernetes log fetching and timestamp filtering.
- Updated and removed obsolete test fixtures to align with new toolset behavior.
- Enhanced test infrastructure with tracing spans and simplified evaluation span management.
- Improved Coralogix integration tests with environment validation and expanded log fetching scenarios.
- Added new integration tests for Grafana Loki and OpenSearch log fetching.
- Added prompt rendering tests for log-fetching toolsets.
-
Documentation
- Updated prompt instructions and test case configurations to reflect new log-fetching workflows and evaluation criteria.
-
Chores
- Removed deprecated log toolset configurations and related test data.
- Consolidated environment variable handling in CI workflows.
- Refined evaluation logic and test metadata for improved clarity and traceability.
Walkthrough
This update introduces a unified, strongly typed logging API for Kubernetes pod log retrieval, refactoring major logging toolsets (Kubernetes, Coralogix, Grafana Loki, OpenSearch) to use a standard fetch_pod_logs interface and parameter model. It restructures toolset management, moves ToolExecutor to a new module, updates prompt templates, and revises test fixtures and evaluation logic accordingly.
Changes
| File(s) / Path(s) | Change Summary |
|---|---|
| holmes/plugins/toolsets/logging_utils/logging_api.py (new) | Adds a unified, typed logging API with standard config, parameter, and toolset base classes for pod log retrieval. |
| holmes/plugins/toolsets/kubernetes_logs.py (new) | Implements a new Kubernetes logs toolset using the unified API, with structured log parsing/filtering and error handling. |
| holmes/plugins/toolsets/coralogix/api.py, toolset_coralogix_logs.py, utils.py | Refactors Coralogix toolset to use typed parameters and unified log-fetching interface; consolidates config and log processing utilities. |
| holmes/plugins/toolsets/grafana/toolset_grafana_loki.py, grafana_api.py, common.py, loki_api.py, base_grafana_toolset.py | Refactors Grafana Loki toolset to use the unified logging API, simplifies health check logic, and updates log formatting. |
| holmes/plugins/toolsets/opensearch/opensearch_logs.py, opensearch_utils.py | Refactors OpenSearch logs toolset to a single unified class using typed config and standardized query construction. |
| holmes/plugins/toolsets/utils.py | Adds timestamp conversion utility for log filtering; updates default time span calculation logic. |
| holmes/plugins/toolsets/init.py, holmes/common/env_vars.py | Adds legacy flag for Kubernetes logs toolset; updates toolset loading logic to respect the flag. |
| holmes/core/tools.py, holmes/core/tools_utils/tool_executor.py (new), holmes/core/tools_utils/toolset_utils.py (new), holmes/config.py, holmes/core/tool_calling_llm.py | Removes ToolExecutor from tools.py, moves it to a new module, and adds a utility to filter logging toolsets. Updates imports accordingly. |
| holmes/core/conversations.py | Ensures conversation history is copied before mutation in chat message building. |
| holmes/plugins/prompts/_default_log_prompt.jinja2 (new), _fetch_logs.jinja2, _general_instructions.jinja2 | Adds and updates prompt templates and investigation instructions for the new logging API and tool usage. |
| holmes/plugins/toolsets/robusta/robusta_instructions.jinja2 | Adds instructions for investigating issues by finding IDs. |
| examples/custom_llm.py, .github/workflows/llm-evaluation.yaml | Updates imports for ToolExecutor; renames and simplifies workflow job. |
| tests/llm/utils/mock_toolset.py, mock_utils.py, classifiers.py, braintrust.py | Refactors mock tool wrappers for span tracing, updates test case loading for conversation history, and enhances evaluation tracing and Braintrust integration. |
| tests/llm/test_ask_holmes.py, test_investigate.py, test_mocks.py | Updates test functions for explicit span management and parent span propagation; updates ToolExecutor import. |
| tests/llm/fixtures/** (multiple) | Replaces legacy log tool invocations with fetch_pod_logs, adds/updates fixtures for new logging API, updates test cases and expected outputs, and removes legacy or redundant files. |
| docs/evals-writing.md | Adds documentation on evaluation tagging. |
Changes Table (Condensed)
| Area / Files | Change Summary |
|---|---|
Logging API & Toolsets: holmes/plugins/toolsets/logging_utils/logging_api.py, kubernetes_logs.py, coralogix/*, grafana/*, opensearch/* |
Introduces unified logging API, refactors all major log toolsets to use typed parameters and a consistent interface. |
Toolset Management: holmes/core/tools.py, tools_utils/tool_executor.py, tools_utils/toolset_utils.py, holmes/config.py, holmes/core/tool_calling_llm.py |
Removes and relocates ToolExecutor, adds utility for filtering default logging toolsets, updates related imports. |
Prompts & Instructions: holmes/plugins/prompts/_default_log_prompt.jinja2, _fetch_logs.jinja2, _general_instructions.jinja2, robusta_instructions.jinja2 |
Adds/updates prompt templates and investigation instructions for the new logging API and tool usage. |
Test Infra & Mocks: tests/llm/utils/mock_toolset.py, mock_utils.py, classifiers.py, braintrust.py, test_ask_holmes.py, test_investigate.py, test_mocks.py |
Refactors for span-based tracing, conversation history support, and Braintrust integration in test evaluation. |
Test Fixtures: tests/llm/fixtures/** |
Updates, adds, or deletes fixtures to use fetch_pod_logs and new logging API, revises test cases and expected outputs. |
Miscellaneous: .github/workflows/llm-evaluation.yaml, docs/evals-writing.md, examples/custom_llm.py, holmes/core/conversations.py |
Workflow/job name update, documentation on tagging, import fixes, and defensive copy for conversation history. |
Sequence Diagram(s)
sequenceDiagram
participant User
participant LLM
participant ToolExecutor
participant Toolset (K8s/Coralogix/Loki/OpenSearch)
participant LoggingBackend
User->>LLM: User prompt (e.g., "Show logs for pod X")
LLM->>ToolExecutor: invoke("fetch_pod_logs", params)
ToolExecutor->>Toolset: fetch_pod_logs(params)
Toolset->>LoggingBackend: Query logs (with typed params)
LoggingBackend-->>Toolset: Log entries (structured)
Toolset-->>ToolExecutor: StructuredToolResult (logs, status)
ToolExecutor-->>LLM: StructuredToolResult
LLM-->>User: Answer (with log excerpts/analysis)
Possibly related PRs
- robusta-dev/holmesgpt#440: Refactors the Kubernetes logs toolset to add structured log entries, filtering, and formatting, directly relating to the new unified Kubernetes logs toolset introduced here.
- robusta-dev/holmesgpt#421: Refactors the OpenSearch logs toolset to consolidate functionality into a single class using the unified logging API, which matches the OpenSearch toolset refactor in this PR.
- robusta-dev/holmesgpt#429: Refactors the Coralogix logs toolset to use a unified logging API and typed parameters, directly related to the Coralogix refactor in this PR.
Suggested labels
enhancement
Suggested reviewers
- arikalon1
- moshemorad
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Explain this complex logic.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai explain this code block.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read src/utils.ts and explain its main purpose.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai generate docstringsto generate docstrings for this PR.@coderabbitai generate sequence diagramto generate a sequence diagram of the changes in this PR.@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
Results of HolmesGPT evals
- ask_holmes: 42/59 test cases were successful
- investigate: 13/13 test cases were successful
Legend
- :white_check_mark: the test was successful
- :warning: the test failed but is known to be flakky or known to fail
- :x: the test failed and should be fixed before merging the PR