holmesgpt icon indicating copy to clipboard operation
holmesgpt copied to clipboard

ROB-2161 improved truncation methodology

Open nherment opened this issue 3 months ago • 2 comments

nherment avatar Sep 19 '25 14:09 nherment

Walkthrough

Adds StructuredToolResult.llm_data and per-tool token accounting (ToolCallResult.size); integrates hierarchical Prometheus and New Relic metric compression into tool flows; updates context-window limiter to use per-call sizes; and introduces tests, an HTTP traffic generator fixture, and a Kubernetes manifest for end-to-end testing.

Changes

Cohort / File(s) Summary
Tool result model
holmes/core/tools.py
Add public field llm_data: Optional[str] = None to StructuredToolResult.
Result formatting logic
holmes/core/models.py
In format_tool_result_data, if tool_result.llm_data exists and tool_result.data is a JSON string containing both "random_key" and "data", parse and shallow-copy the object, replace inner "data" with llm_data, set llm_data=None, swallow exceptions, then proceed with existing string/non-string handling.
Per-tool-call token accounting
holmes/core/tool_calling_llm.py
Construct ToolCallResult, build message representation, count tokens via LLM, set ToolCallResult.size, include token_count in tool-call logs, and return the enriched result. ToolCallResult gains public size.
Context window limiter
holmes/core/tools_utils/tool_context_window_limiter.py
Lazily compute tool_call_result.size when unset, use size for threshold/percentage calculations and error reporting, pass size to Sentry metadata, and clear size after error handling.
Prometheus compression module (new)
holmes/plugins/toolsets/prometheus/data_compression.py
New module with CompressedMetric, Group, PreFilteredMetrics, conversion/grouping/prefiltering, and rendering/summarization helpers (format_labels, format_data, compact_metrics, etc.).
Prometheus toolset integration
holmes/plugins/toolsets/prometheus/prometheus.py
Add config fields compress_range_metrics and compress_range_metrics_minimum_ratio; convert range results to compressed metrics, compute original vs compressed sizes and ratio; set StructuredToolResult.llm_data to compressed payload when beneficial and attach compression_info and raw_data; otherwise keep original data and produce summaries for overly-large responses.
New Relic metrics compression
holmes/plugins/toolsets/newrelic/newrelic.py
Add compact_metrics_data flow, change to_prometheus_records to return PromResponse, attach llm_data when compression is beneficial, and add config fields compact_metrics / compact_metrics_minimum_ratio.
Tests for compression
tests/plugins/toolsets/prometheus/test_data_compression.py, .../raw_prometheus_data.json
New unit tests for label-frequency detection, grouping, formatting, and realistic summarization/ratio assertions exercising the compression helpers.
LLM test fixture & toolset config
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml, .../toolsets.yaml
New test case describing periodic 404s, setup/teardown steps, and toolset fixture pointing Prometheus to http://localhost:9160.
Traffic generator & K8s manifest (fixtures)
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py, .../manifest.yaml
New high-cardinality HTTP traffic generator module and Kubernetes manifest (namespace, secret embedding script, deployments, Service, ServiceMonitor, traffic-simulator) for end-to-end testing.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Tool as PrometheusTool
  participant Prom as Prometheus API
  participant DC as DataCompression
  participant FM as Formatter
  Note over Tool,DC: Range query execution with optional compression
  User->>Tool: ExecuteRangeQuery(params)
  Tool->>Prom: HTTP range query
  Prom-->>Tool: Raw JSON result
  Tool->>DC: Convert raw -> CompressedMetric[]
  DC-->>Tool: Compressed metrics + rendered payload
  Tool->>Tool: Compute original_size, compressed_size, ratio
  alt Compression beneficial
    Tool->>FM: Use compressed payload as llm_data
    FM-->>Tool: llm_data string
    Note right of Tool: Set StructuredToolResult.llm_data
  else Not beneficial
    Note right of Tool: Keep original data, attach compression_info
  end
  opt Response too large
    Tool->>FM: create_data_summary_for_large_result(...)
    FM-->>Tool: Summary text
  end
  Tool-->>User: StructuredToolResult { data, llm_data?, compression_info? }
sequenceDiagram
  autonumber
  participant Runner
  participant Tool as Tool Runner
  participant LLM as LLM Client
  participant Log as Logger
  Note over Tool,LLM: Per-tool-call token counting
  Runner->>Tool: Invoke tool
  Tool->>Tool: Build ToolCallResult
  Tool->>LLM: as_tool_call_message -> count_tokens()
  LLM-->>Tool: token_count
  Tool->>Tool: Set ToolCallResult.size = token_count
  Tool->>Log: Log metadata { token_count: size, ... }
  Tool-->>Runner: ToolCallResult(size, ...)
sequenceDiagram
  autonumber
  participant Limiter as ContextWindowLimiter
  participant Res as ToolCallResult
  participant LLM as LLM Client
  Note over Limiter,Res: Size-based window check with lazy init
  Limiter->>Res: Read size
  alt size unset
    Limiter->>LLM: count_tokens(as_tool_call_message)
    LLM-->>Limiter: token_count
    Limiter->>Res: Set size = token_count
  end
  alt size > max_allowed
    Limiter-->>Limiter: Reject / truncate per policy
  else
    Limiter-->>Limiter: Accept
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • robusta-dev/holmesgpt#975 — Related Prometheus toolset changes and range-query handling overlap with compression/llm_data integration.
  • robusta-dev/holmesgpt#963 — Overlaps per-tool LLM token-counting and logging changes in holmes/core/tool_calling_llm.py.
  • robusta-dev/holmesgpt#919 — Touches format_tool_result_data and serialization surfaces; overlaps llm_data-aware formatting logic.

Suggested reviewers

  • moshemorad
  • aantn

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is completely missing and therefore provides no context or summary for the extensive changes, which span multiple modules and introduce significant new functionality. Without any description, reviewers cannot quickly grasp the intent, scope, or objectives of this work. Including a high-level summary of the objectives and key updates would greatly improve the review process. Please add a descriptive pull request description that outlines the goals of this change, such as improved LLM data handling via JSON parsing, token usage tracking, and the addition of Prometheus/NewRelic compression features. A clear summary will guide reviewers and ensure alignment on the intended enhancements.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title “ROB-2161 improved truncation methodology” is too generic and does not clearly reflect the extensive changes in this pull request, which include JSON parsing for LLM data, token usage tracking, and new data compression pipelines for Prometheus and NewRelic integrations. It fails to convey the primary enhancement or the scope of the work to a reviewer scanning the history. A more specific title highlighting the core improvements in LLM data handling and truncation logic would provide clearer context. Please rename the pull request to summarize the main change succinctly, for example “Enhance LLM response truncation with JSON llm_data parsing, token counting, and compression utilities.” This will help reviewers immediately understand the key improvements without needing to inspect the full diff.
✨ Finishing touches
  • [ ] 📝 Generate Docstrings
🧪 Generate unit tests
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch ROB-2161_improved_truncation_methodology

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Sep 19 '25 14:09 coderabbitai[bot]

Results of HolmesGPT evals

  • ask_holmes: 33/36 test cases were successful, 1 regressions, 1 setup failures
Test suite Test case Status
ask 01_how_many_pods :white_check_mark:
ask 02_what_is_wrong_with_pod :white_check_mark:
ask 04_related_k8s_events :white_check_mark:
ask 05_image_version :white_check_mark:
ask 09_crashpod :white_check_mark:
ask 10_image_pull_backoff :white_check_mark:
ask 110_k8s_events_image_pull :white_check_mark:
ask 11_init_containers :white_check_mark:
ask 13a_pending_node_selector_basic :white_check_mark:
ask 14_pending_resources :white_check_mark:
ask 15_failed_readiness_probe :white_check_mark:
ask 17_oom_kill :white_check_mark:
ask 19_detect_missing_app_details :white_check_mark:
ask 20_long_log_file_search :white_check_mark:
ask 24_misconfigured_pvc :white_check_mark:
ask 24a_misconfigured_pvc_basic :white_check_mark:
ask 28_permissions_error :construction:
ask 33_cpu_metrics_discovery :x:
ask 39_failed_toolset :white_check_mark:
ask 41_setup_argo :white_check_mark:
ask 42_dns_issues_steps_new_tools :warning:
ask 43_current_datetime_from_prompt :white_check_mark:
ask 45_fetch_deployment_logs_simple :white_check_mark:
ask 51_logs_summarize_errors :white_check_mark:
ask 53_logs_find_term :white_check_mark:
ask 54_not_truncated_when_getting_pods :white_check_mark:
ask 59_label_based_counting :white_check_mark:
ask 60_count_less_than :white_check_mark:
ask 61_exact_match_counting :white_check_mark:
ask 63_fetch_error_logs_no_errors :white_check_mark:
ask 79_configmap_mount_issue :white_check_mark:
ask 83_secret_not_found :white_check_mark:
ask 86_configmap_like_but_secret :white_check_mark:
ask 93_calling_datadog[0] :white_check_mark:
ask 93_calling_datadog[1] :white_check_mark:
ask 93_calling_datadog[2] :white_check_mark:

Legend

  • :white_check_mark: the test was successful
  • :minus: the test was skipped
  • :warning: the test failed but is known to be flaky or known to fail
  • :construction: the test had a setup failure (not a code regression)
  • :wrench: the test failed due to mock data issues (not a code regression)
  • :no_entry_sign: the test was throttled by API rate limits/overload
  • :x: the test failed and should be fixed before merging the PR

github-actions[bot] avatar Sep 29 '25 05:09 github-actions[bot]