holmesgpt icon indicating copy to clipboard operation
holmesgpt copied to clipboard

feat: cache toolset status and add toolset management tool command

Open mainred opened this issue 7 months ago • 1 comments

  • use isort to sort the imported packaged
  • introduce toolset_manager to manage toolsets, which was Config's job.
  • only enable build-in toolsets specified in the config file
  • use load_toolsets_config as the only entrypoint to load initialize and validate a toolset from a config/definition
  • Besides the mcp toolset type, this PR also introduces built-in and customized toolset type to differentiate the source of these toolsets
  • introduce cli command to list and refresh the toolset status from local cache
 ./dist/holmes/holmes toolset --help

 Usage: holmes toolset [OPTIONS] COMMAND [ARGS]...

 toolset management commands

╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ list      List build-int and custom toolsets status of CLI                                                                                            │
│ refresh   Refresh build-in and custom toolsets status of CLI                                                                                          │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

./dist/holmes/holmes toolset list
                                                   Toolset Status
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Name                             ┃ Status   ┃ Enabled ┃ Type       ┃ Path                                ┃ Error ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ confluence                       │ disabled │ False   │ built-in   │ None                                │ None  │
│ docker/core                      │ disabled │ False   │ built-in   │ None                                │ None  │
│ helm/core                        │ disabled │ False   │ built-in   │ None                                │ None  │
│ kubernetes/logs                  │ disabled │ False   │ built-in   │ None                                │ None  │
│ servicenow                       │ disabled │ False   │ built-in   │ None                                │ None  │
│ aks/node-health                  │ disabled │ False   │ built-in   │ None                                │ None  │
│ aws/security                     │ disabled │ False   │ built-in   │ None                                │ None  │
│ aws/rds                          │ disabled │ False   │ built-in   │ None                                │ None  │
│ argocd/core                      │ disabled │ False   │ built-in   │ None                                │ None  │
│ slab                             │ disabled │ False   │ built-in   │ None                                │ None  │
│ aks/core                         │ disabled │ False   │ built-in   │ None                                │ None  │
│ kubernetes/core                  │ enabled  │ True    │ built-in   │ None                                │ None  │
│ kubernetes/live-metrics          │ disabled │ False   │ built-in   │ None                                │ None  │
│ kubernetes/kube-prometheus-stack │ disabled │ False   │ built-in   │ None                                │ None  │
│ kubernetes/krew-extras           │ disabled │ False   │ built-in   │ None                                │ None  │
│ kubernetes/kube-lineage-extras   │ disabled │ False   │ built-in   │ None                                │ None  │
│ internet                         │ disabled │ False   │ built-in   │ None                                │ None  │
│ robusta                          │ disabled │ False   │ built-in   │ None                                │ None  │
│ opensearch/status                │ disabled │ False   │ built-in   │ None                                │ None  │
│ grafana/loki                     │ disabled │ False   │ built-in   │ None                                │ None  │
│ grafana/tempo                    │ disabled │ False   │ built-in   │ None                                │ None  │
│ newrelic                         │ disabled │ False   │ built-in   │ None                                │ None  │
│ grafana/grafana                  │ disabled │ False   │ built-in   │ None                                │ None  │
│ notion                           │ disabled │ False   │ built-in   │ None                                │ None  │
│ kafka/admin                      │ disabled │ False   │ built-in   │ None                                │ None  │
│ datadog                          │ disabled │ False   │ built-in   │ None                                │ None  │
│ prometheus/metrics               │ disabled │ False   │ built-in   │ None                                │ None  │
│ datetime                         │ enabled  │ True    │ built-in   │ None                                │ None  │
│ opensearch/logs                  │ disabled │ False   │ built-in   │ None                                │ None  │
│ opensearch/traces                │ disabled │ False   │ built-in   │ None                                │ None  │
│ coralogix/logs                   │ disabled │ False   │ built-in   │ None                                │ None  │
│ rabbitmq/core                    │ disabled │ False   │ built-in   │ None                                │ None  │
│ git                              │ disabled │ False   │ built-in   │ None                                │ None  │
│  ig/core                          │ enabled  │ True    │ customized │ /home/azureuser/inspect_gadget.yaml │ None  │
./dist/holmes/holmes ask "detect why the k8s pod client under namespace test-ns cannot resolve dns" -f /home/azureuser/llm/demo/dns_troubleshooting_instructions.md --max-steps 20 -t ~/inspect_gadget.yaml
User: detect why the k8s pod client under namespace test-ns cannot resolve dns
Loading file /home/azureuser/llm/demo/dns_troubleshooting_instructions.md
Running tool kubectl_find_resource: kubectl get -A --show-labels -o wide pod | grep client                                                   tools.py:125
Running tool kubectl_describe: kubectl describe pod client -n test-ns                                                                        tools.py:125
Running tool kubectl_get_by_kind_in_namespace: kubectl get --show-labels -o wide pod -n kube-system                                          tools.py:125
Running tool kubectl_get_by_kind_in_namespace: kubectl get --show-labels -o wide svc -n kube-system                                          tools.py:125
Running tool kubectl_describe: kubectl describe pod coredns-57d886c994-8h9gt -n kube-system                                                  tools.py:125
Running tool kubectl_describe: kubectl describe pod coredns-57d886c994-vqmrz -n kube-system                                                  tools.py:125
Running tool kubectl_get_by_name: kubectl get --show-labels -o wide svc kube-dns -n kube-system                                              tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml pod client -n test-ns                                                                     tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml pod coredns-57d886c994-8h9gt -n kube-system                                               tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml pod coredns-57d886c994-vqmrz -n kube-system                                               tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml svc kube-dns -n kube-system                                                               tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml configmap coredns -n kube-system                                                          tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml configmap coredns-custom -n kube-system                                                   tools.py:125
Running tool kubectl_get_by_kind_in_namespace: kubectl get --show-labels -o wide networkpolicy -n test-ns                                    tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml networkpolicy default-deny-egress -n test-ns                                              tools.py:125
tool_calling_llm.call - completed in 9 iterations - 27071ms                                                                      performance_timing.py:41
AI: DNS resolution fails for pod client in namespace test-ns because a default-deny-egress NetworkPolicy is present. This policy blocks all outbound traffic,
including DNS (UDP/TCP port 53) to the CoreDNS service at 10.0.0.10.

Key findings:

 • CoreDNS pods are healthy and running.
 • CoreDNS service is correctly configured.
 • Pod spec uses dnsPolicy: ClusterFirst (default, correct).
 • NetworkPolicy default-deny-egress in test-ns blocks all egress by default.

To resolve, update the NetworkPolicy to allow egress to 10.0.0.10 on port 53/UDP and 53/TCP.

See the official Kubernetes DNS debugging guide for more: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ (section:
"Network policies blocking DNS").

Summary by CodeRabbit

  • New Features

    • Added a CLI toolset management interface with commands to list and refresh the status of built-in and custom toolsets.
    • Introduced centralized toolset management, allowing for easier merging, validation, and status caching of toolsets.
    • Added support for environment variable substitution in toolset configuration files.
    • Introduced structured resource instruction models for improved investigation context.
  • Bug Fixes

    • Corrected a typographical error in the integration guide.
  • Refactor

    • Streamlined toolset status and error handling by replacing method calls with direct attribute access.
    • Centralized toolset management logic and removed redundant code.
    • Improved import organization and code clarity throughout the project.
    • Updated configuration loading to use Path objects and simplified tool executor creation.
  • Chores

    • Updated tests and templates to align with new toolset status handling.
    • Enhanced error messages and validation for toolset configuration files.

mainred avatar May 28 '25 12:05 mainred

Summary by CodeRabbit

  • New Features

    • Added a CLI toolset management interface with commands to list and refresh statuses of built-in and custom toolsets.
    • Introduced centralized toolset management via a new ToolsetManager class for merging, validation, and status caching.
    • Added support for environment variable substitution in toolset configuration files.
    • Introduced structured resource instruction models to enhance investigation context.
  • Bug Fixes

    • Corrected a typographical error in the integration guide.
  • Refactor

    • Streamlined toolset status and error handling by replacing method calls with direct attribute access.
    • Centralized toolset management logic and removed redundant legacy code.
    • Improved import organization and code clarity across multiple modules.
    • Updated configuration loading to use Path objects and simplified tool executor creation.
    • Replaced legacy toolset loading and merging logic with the ToolsetManager class.
  • Chores

    • Updated tests and templates to align with new toolset status handling and configuration formats.
    • Enhanced error messages and validation for toolset configuration files.
    • Removed deprecated tests related to old custom toolset loading methods.
    • Added comprehensive unit tests for the ToolsetManager and toolset loading utilities.
    • Added new utility for environment variable replacement in configuration values.
    • Added tabulate dependency for improved CLI output formatting.

Walkthrough

This update centralizes toolset management by introducing a ToolsetManager class, refactors configuration and toolset loading logic, and adds new CLI commands for toolset status listing and refreshing. Several import statements are reorganized, and method calls for toolset status are replaced with direct attribute access. Additional utility modules and Pydantic models are introduced.

Changes

File(s) Change Summary
holmes/config.py Refactored to use ToolsetManager for toolset loading, merging, and status management; removed legacy toolset functions and methods; updated signatures and logic to use Path and new manager.
holmes/core/resource_instruction.py Added new Pydantic models: ResourceInstructionDocument and ResourceInstructions for investigation context.
holmes/core/supabase_dal.py, holmes/core/tool_calling_llm.py, holmes/core/tools.py Reorganized imports; removed Pydantic resource instruction models from tool_calling_llm.py; replaced toolset status method calls with attribute access; added/modified toolset types and status fields.
holmes/core/toolset_manager.py Introduced new ToolsetManager class for centralized toolset management, merging, prerequisite checking, and caching.
holmes/main.py Added CLI toolset group with list and refresh commands; updated config and custom toolset options; removed allowed_toolsets logic; improved CLI option defaults and help text.
holmes/plugins/prompts/_fetch_logs.jinja2, holmes/plugins/prompts/_toolsets_instructions.jinja2 Replaced get_status()/get_error() method calls with direct attribute access for toolset status and error.
holmes/plugins/toolsets/init.py Refactored toolset YAML loading and validation; added environment variable replacement; improved error handling and validation.
holmes/plugins/toolsets/consts.py Added constants for toolset configuration errors and standard parameter descriptions.
holmes/plugins/toolsets/coralogix/toolset_coralogix_logs.py, holmes/plugins/toolsets/grafana/base_grafana_toolset.py, holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py, holmes/plugins/toolsets/kafka.py, holmes/plugins/toolsets/opensearch/opensearch.py, holmes/plugins/toolsets/prometheus/prometheus.py Reorganized import statements; updated constant imports to use new consts.py; minor formatting changes.
holmes/plugins/toolsets/utils.py Removed unused constants and reordered imports.
holmes/utils/default_toolset_installation_guide.jinja2 Fixed typo in template text.
holmes/utils/env.py Added new utility module for recursive environment variable replacement in configs.
holmes/utils/holmes_sync_toolsets.py Replaced toolset status method calls with attribute access.
holmes/utils/pydantic_utils.py Changed load_model_from_file to accept Path instead of str; minor error message formatting.
tests/llm/utils/mock_toolset.py, tests/plugins/toolsets/grafana/test_grafana_loki.py, tests/plugins/toolsets/grafana/test_grafana_tempo.py, tests/plugins/toolsets/test_notion.py, tests/plugins/toolsets/test_prometheus_integration.py, tests/plugins/toolsets/test_tool_kafka.py Updated tests to use toolset status attribute instead of method; reorganized imports.
tests/core/test_toolset_manager.py Added comprehensive unit tests for ToolsetManager covering loading, merging, caching, filtering, and error handling.
tests/plugins/toolsets/test_load_config.py Added tests for load_toolsets_from_config validating old format rejection and environment variable substitution.
tests/config_class/test_config_load_custom_toolsets.py Removed legacy tests for custom toolset loading from Config class.
tests/config_class/test_config_load_remote_mcp_server.py Updated imports and calls to use new load_toolsets_from_config instead of removed legacy functions.
tests/test_check_prerequisites.py, tests/test_holmes_sync_toolsets.py Replaced calls to get_error() with direct access to error attribute in assertions.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant Config
    participant ToolsetManager
    participant Toolset
    participant Cache

    CLI->>Config: load_from_file(config_file, custom_toolsets)
    Config->>ToolsetManager: initialize(toolsets, custom_toolsets)
    Config->>ToolsetManager: .load_toolset_with_status()
    ToolsetManager->>Cache: load cached toolset status
    ToolsetManager->>Toolset: check prerequisites (if refresh needed)
    Toolset->>ToolsetManager: return status, error
    ToolsetManager->>Cache: update cache if needed
    ToolsetManager->>Config: return list of Toolset objects with status
    Config->>CLI: provide toolsets for CLI commands

    CLI->>CLI: pretty_print_toolset_status(toolsets)

Assessment against linked issues

Objective Addressed Explanation
Disable all toolsets by default and load only explicitly enabled toolsets from config (#426)
Simplify and speed up HolmesGPT startup by avoiding prerequisite checks on all toolsets (#426)
Provide CLI commands to list and refresh toolset statuses (#426)

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Addition of new Pydantic models for resource instructions in holmes/core/resource_instruction.py Not related to toolset loading or CLI changes; unrelated to linked issue objectives.
Typo fix in holmes/utils/default_toolset_installation_guide.jinja2 Minor documentation fix, unrelated to linked issue objectives.
Removal of legacy test modules for custom toolset loading Testing cleanup related to refactor, not explicitly stated in linked issues.
✨ Finishing Touches
  • [ ] 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar May 28 '25 12:05 coderabbitai[bot]

Had call with @moshemorad, the notes from the call

print toolset is load from cache during ask command
fix the toolset table issue
update the custom-toolset flag description to the toolset is for experimental and not made permanent
add doc to explain the customer-facing changes.

mainred avatar Jun 05 '25 12:06 mainred

Could you please take a look at this precommit check failure? where should I add this dependency tabulate? @@moshemorad Thanks in advance.

I have updated the pyproject.toml to include the package.

how should I deal with this build error? feat: cache toolset status and add toolset management tool command · robusta-dev/holmesgpt@b505f81

holmes/main.py:5: error: Library stubs not installed for "tabulate"  [import-untyped]
[161](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:166)
holmes/main.py:5: note: Hint: "python3 -m pip install types-tabulate"
[162](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:167)
holmes/main.py:5: note: (or run "mypy --install-types" to install all missing stub packages)
[163](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:168)
holmes/main.py:5: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
[164](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:169)
Found 1 error in 1 file (checked 85 source files)

mainred avatar Jun 05 '25 12:06 mainred

I have resolved code conflicts multiple times since my change touch the config.py. I'll focus on the PR comments if there are incoming ones, and until all looks good, I'll resolve the conflicts. cc @moshemorad

mainred avatar Jun 05 '25 15:06 mainred

Could you please take a look at this precommit check failure? where should I add this dependency tabulate? @@moshemorad Thanks in advance.

I have updated the pyproject.toml to include the package.

how should I deal with this build error? feat: cache toolset status and add toolset management tool command · robusta-dev/holmesgpt@b505f81

holmes/main.py:5: error: Library stubs not installed for "tabulate"  [import-untyped]
[161](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:166)
holmes/main.py:5: note: Hint: "python3 -m pip install types-tabulate"
[162](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:167)
holmes/main.py:5: note: (or run "mypy --install-types" to install all missing stub packages)
[163](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:168)
holmes/main.py:5: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
[164](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:169)
Found 1 error in 1 file (checked 85 source files)

Currently add # type: ignore

moshemorad avatar Jun 05 '25 15:06 moshemorad

I have resolved code conflicts multiple times since my change touch the config.py. I'll focus on the PR comments if there are incoming ones, and until all looks good, I'll resolve the conflicts. cc @moshemorad

The fix is due to the missed error when we changed private attr in the Toolset Model. So it might raise more issue. Let me know if you bump into anything that need my help.

moshemorad avatar Jun 05 '25 15:06 moshemorad

Let me know if you bump into anything that need my help.

This PR is ready for review. Please take another look. Thanks.

mainred avatar Jun 05 '25 15:06 mainred

Please ignore the conflicts lets focus on the PR change first. I will definitely resolve the conflicts after addressing all comments to reduce the chance of resolving conflicts.

mainred avatar Jun 05 '25 15:06 mainred

@moshemorad could you please help find the root cause of llm evaluation error. The following error message raised from the pipeline seems to be common issue? Thanks. `httpx.LocalProtocolError: Illegal header value b'Bearer '

mainred avatar Jun 06 '25 05:06 mainred

@moshemorad could you please help find the root cause of llm evaluation error. The following error message raised from the pipeline seems to be common issue? Thanks. `httpx.LocalProtocolError: Illegal header value b'Bearer '

Looks like for some reason it isn't adding the OpenAI key, trying to understand why.

moshemorad avatar Jun 09 '25 11:06 moshemorad

@mainred Looks like the issue is due to forked repos don't use the original repo secrets. https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions#using-secrets-in-a-workflow

@arikalon1 @nherment any idea what we want to do in those cases?

moshemorad avatar Jun 09 '25 11:06 moshemorad