holmesgpt
holmesgpt copied to clipboard
ROB-2161 improved truncation methodology
Walkthrough
Adds StructuredToolResult.llm_data and per-tool token accounting (ToolCallResult.size); integrates hierarchical Prometheus and New Relic metric compression into tool flows; updates context-window limiter to use per-call sizes; and introduces tests, an HTTP traffic generator fixture, and a Kubernetes manifest for end-to-end testing.
Changes
| Cohort / File(s) | Summary |
|---|---|
Tool result modelholmes/core/tools.py |
Add public field llm_data: Optional[str] = None to StructuredToolResult. |
Result formatting logicholmes/core/models.py |
In format_tool_result_data, if tool_result.llm_data exists and tool_result.data is a JSON string containing both "random_key" and "data", parse and shallow-copy the object, replace inner "data" with llm_data, set llm_data=None, swallow exceptions, then proceed with existing string/non-string handling. |
Per-tool-call token accountingholmes/core/tool_calling_llm.py |
Construct ToolCallResult, build message representation, count tokens via LLM, set ToolCallResult.size, include token_count in tool-call logs, and return the enriched result. ToolCallResult gains public size. |
Context window limiterholmes/core/tools_utils/tool_context_window_limiter.py |
Lazily compute tool_call_result.size when unset, use size for threshold/percentage calculations and error reporting, pass size to Sentry metadata, and clear size after error handling. |
Prometheus compression module (new)holmes/plugins/toolsets/prometheus/data_compression.py |
New module with CompressedMetric, Group, PreFilteredMetrics, conversion/grouping/prefiltering, and rendering/summarization helpers (format_labels, format_data, compact_metrics, etc.). |
Prometheus toolset integrationholmes/plugins/toolsets/prometheus/prometheus.py |
Add config fields compress_range_metrics and compress_range_metrics_minimum_ratio; convert range results to compressed metrics, compute original vs compressed sizes and ratio; set StructuredToolResult.llm_data to compressed payload when beneficial and attach compression_info and raw_data; otherwise keep original data and produce summaries for overly-large responses. |
New Relic metrics compressionholmes/plugins/toolsets/newrelic/newrelic.py |
Add compact_metrics_data flow, change to_prometheus_records to return PromResponse, attach llm_data when compression is beneficial, and add config fields compact_metrics / compact_metrics_minimum_ratio. |
Tests for compressiontests/plugins/toolsets/prometheus/test_data_compression.py, .../raw_prometheus_data.json |
New unit tests for label-frequency detection, grouping, formatting, and realistic summarization/ratio assertions exercising the compression helpers. |
LLM test fixture & toolset configtests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml, .../toolsets.yaml |
New test case describing periodic 404s, setup/teardown steps, and toolset fixture pointing Prometheus to http://localhost:9160. |
Traffic generator & K8s manifest (fixtures)tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py, .../manifest.yaml |
New high-cardinality HTTP traffic generator module and Kubernetes manifest (namespace, secret embedding script, deployments, Service, ServiceMonitor, traffic-simulator) for end-to-end testing. |
Sequence Diagram(s)
sequenceDiagram
autonumber
actor User
participant Tool as PrometheusTool
participant Prom as Prometheus API
participant DC as DataCompression
participant FM as Formatter
Note over Tool,DC: Range query execution with optional compression
User->>Tool: ExecuteRangeQuery(params)
Tool->>Prom: HTTP range query
Prom-->>Tool: Raw JSON result
Tool->>DC: Convert raw -> CompressedMetric[]
DC-->>Tool: Compressed metrics + rendered payload
Tool->>Tool: Compute original_size, compressed_size, ratio
alt Compression beneficial
Tool->>FM: Use compressed payload as llm_data
FM-->>Tool: llm_data string
Note right of Tool: Set StructuredToolResult.llm_data
else Not beneficial
Note right of Tool: Keep original data, attach compression_info
end
opt Response too large
Tool->>FM: create_data_summary_for_large_result(...)
FM-->>Tool: Summary text
end
Tool-->>User: StructuredToolResult { data, llm_data?, compression_info? }
sequenceDiagram
autonumber
participant Runner
participant Tool as Tool Runner
participant LLM as LLM Client
participant Log as Logger
Note over Tool,LLM: Per-tool-call token counting
Runner->>Tool: Invoke tool
Tool->>Tool: Build ToolCallResult
Tool->>LLM: as_tool_call_message -> count_tokens()
LLM-->>Tool: token_count
Tool->>Tool: Set ToolCallResult.size = token_count
Tool->>Log: Log metadata { token_count: size, ... }
Tool-->>Runner: ToolCallResult(size, ...)
sequenceDiagram
autonumber
participant Limiter as ContextWindowLimiter
participant Res as ToolCallResult
participant LLM as LLM Client
Note over Limiter,Res: Size-based window check with lazy init
Limiter->>Res: Read size
alt size unset
Limiter->>LLM: count_tokens(as_tool_call_message)
LLM-->>Limiter: token_count
Limiter->>Res: Set size = token_count
end
alt size > max_allowed
Limiter-->>Limiter: Reject / truncate per policy
else
Limiter-->>Limiter: Accept
end
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~75 minutes
Possibly related PRs
- robusta-dev/holmesgpt#975 — Related Prometheus toolset changes and range-query handling overlap with compression/llm_data integration.
- robusta-dev/holmesgpt#963 — Overlaps per-tool LLM token-counting and logging changes in
holmes/core/tool_calling_llm.py. - robusta-dev/holmesgpt#919 — Touches
format_tool_result_dataand serialization surfaces; overlaps llm_data-aware formatting logic.
Suggested reviewers
- moshemorad
- aantn
Pre-merge checks and finishing touches
❌ Failed checks (2 warnings, 1 inconclusive)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Description Check | ⚠️ Warning | The pull request description is completely missing and therefore provides no context or summary for the extensive changes, which span multiple modules and introduce significant new functionality. Without any description, reviewers cannot quickly grasp the intent, scope, or objectives of this work. Including a high-level summary of the objectives and key updates would greatly improve the review process. | Please add a descriptive pull request description that outlines the goals of this change, such as improved LLM data handling via JSON parsing, token usage tracking, and the addition of Prometheus/NewRelic compression features. A clear summary will guide reviewers and ensure alignment on the intended enhancements. |
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
| Title Check | ❓ Inconclusive | The title “ROB-2161 improved truncation methodology” is too generic and does not clearly reflect the extensive changes in this pull request, which include JSON parsing for LLM data, token usage tracking, and new data compression pipelines for Prometheus and NewRelic integrations. It fails to convey the primary enhancement or the scope of the work to a reviewer scanning the history. A more specific title highlighting the core improvements in LLM data handling and truncation logic would provide clearer context. | Please rename the pull request to summarize the main change succinctly, for example “Enhance LLM response truncation with JSON llm_data parsing, token counting, and compression utilities.” This will help reviewers immediately understand the key improvements without needing to inspect the full diff. |
✨ Finishing touches
- [ ] 📝 Generate Docstrings
🧪 Generate unit tests
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
ROB-2161_improved_truncation_methodology
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Results of HolmesGPT evals
- ask_holmes: 33/36 test cases were successful, 1 regressions, 1 setup failures
| Test suite | Test case | Status |
|---|---|---|
| ask | 01_how_many_pods | :white_check_mark: |
| ask | 02_what_is_wrong_with_pod | :white_check_mark: |
| ask | 04_related_k8s_events | :white_check_mark: |
| ask | 05_image_version | :white_check_mark: |
| ask | 09_crashpod | :white_check_mark: |
| ask | 10_image_pull_backoff | :white_check_mark: |
| ask | 110_k8s_events_image_pull | :white_check_mark: |
| ask | 11_init_containers | :white_check_mark: |
| ask | 13a_pending_node_selector_basic | :white_check_mark: |
| ask | 14_pending_resources | :white_check_mark: |
| ask | 15_failed_readiness_probe | :white_check_mark: |
| ask | 17_oom_kill | :white_check_mark: |
| ask | 19_detect_missing_app_details | :white_check_mark: |
| ask | 20_long_log_file_search | :white_check_mark: |
| ask | 24_misconfigured_pvc | :white_check_mark: |
| ask | 24a_misconfigured_pvc_basic | :white_check_mark: |
| ask | 28_permissions_error | :construction: |
| ask | 33_cpu_metrics_discovery | :x: |
| ask | 39_failed_toolset | :white_check_mark: |
| ask | 41_setup_argo | :white_check_mark: |
| ask | 42_dns_issues_steps_new_tools | :warning: |
| ask | 43_current_datetime_from_prompt | :white_check_mark: |
| ask | 45_fetch_deployment_logs_simple | :white_check_mark: |
| ask | 51_logs_summarize_errors | :white_check_mark: |
| ask | 53_logs_find_term | :white_check_mark: |
| ask | 54_not_truncated_when_getting_pods | :white_check_mark: |
| ask | 59_label_based_counting | :white_check_mark: |
| ask | 60_count_less_than | :white_check_mark: |
| ask | 61_exact_match_counting | :white_check_mark: |
| ask | 63_fetch_error_logs_no_errors | :white_check_mark: |
| ask | 79_configmap_mount_issue | :white_check_mark: |
| ask | 83_secret_not_found | :white_check_mark: |
| ask | 86_configmap_like_but_secret | :white_check_mark: |
| ask | 93_calling_datadog[0] | :white_check_mark: |
| ask | 93_calling_datadog[1] | :white_check_mark: |
| ask | 93_calling_datadog[2] | :white_check_mark: |
Legend
- :white_check_mark: the test was successful
- :minus: the test was skipped
- :warning: the test failed but is known to be flaky or known to fail
- :construction: the test had a setup failure (not a code regression)
- :wrench: the test failed due to mock data issues (not a code regression)
- :no_entry_sign: the test was throttled by API rate limits/overload
- :x: the test failed and should be fixed before merging the PR