feat: cache toolset status and add toolset management tool command
- use isort to sort the imported packaged
- introduce toolset_manager to manage toolsets, which was Config's job.
- only enable build-in toolsets specified in the config file
- use load_toolsets_config as the only entrypoint to load initialize and validate a toolset from a config/definition
- Besides the
mcptoolset type, this PR also introducesbuilt-inandcustomizedtoolset type to differentiate the source of these toolsets - introduce cli command to list and refresh the toolset status from local cache
./dist/holmes/holmes toolset --help
Usage: holmes toolset [OPTIONS] COMMAND [ARGS]...
toolset management commands
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ list List build-int and custom toolsets status of CLI │
│ refresh Refresh build-in and custom toolsets status of CLI │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
./dist/holmes/holmes toolset list
Toolset Status
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Name ┃ Status ┃ Enabled ┃ Type ┃ Path ┃ Error ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ confluence │ disabled │ False │ built-in │ None │ None │
│ docker/core │ disabled │ False │ built-in │ None │ None │
│ helm/core │ disabled │ False │ built-in │ None │ None │
│ kubernetes/logs │ disabled │ False │ built-in │ None │ None │
│ servicenow │ disabled │ False │ built-in │ None │ None │
│ aks/node-health │ disabled │ False │ built-in │ None │ None │
│ aws/security │ disabled │ False │ built-in │ None │ None │
│ aws/rds │ disabled │ False │ built-in │ None │ None │
│ argocd/core │ disabled │ False │ built-in │ None │ None │
│ slab │ disabled │ False │ built-in │ None │ None │
│ aks/core │ disabled │ False │ built-in │ None │ None │
│ kubernetes/core │ enabled │ True │ built-in │ None │ None │
│ kubernetes/live-metrics │ disabled │ False │ built-in │ None │ None │
│ kubernetes/kube-prometheus-stack │ disabled │ False │ built-in │ None │ None │
│ kubernetes/krew-extras │ disabled │ False │ built-in │ None │ None │
│ kubernetes/kube-lineage-extras │ disabled │ False │ built-in │ None │ None │
│ internet │ disabled │ False │ built-in │ None │ None │
│ robusta │ disabled │ False │ built-in │ None │ None │
│ opensearch/status │ disabled │ False │ built-in │ None │ None │
│ grafana/loki │ disabled │ False │ built-in │ None │ None │
│ grafana/tempo │ disabled │ False │ built-in │ None │ None │
│ newrelic │ disabled │ False │ built-in │ None │ None │
│ grafana/grafana │ disabled │ False │ built-in │ None │ None │
│ notion │ disabled │ False │ built-in │ None │ None │
│ kafka/admin │ disabled │ False │ built-in │ None │ None │
│ datadog │ disabled │ False │ built-in │ None │ None │
│ prometheus/metrics │ disabled │ False │ built-in │ None │ None │
│ datetime │ enabled │ True │ built-in │ None │ None │
│ opensearch/logs │ disabled │ False │ built-in │ None │ None │
│ opensearch/traces │ disabled │ False │ built-in │ None │ None │
│ coralogix/logs │ disabled │ False │ built-in │ None │ None │
│ rabbitmq/core │ disabled │ False │ built-in │ None │ None │
│ git │ disabled │ False │ built-in │ None │ None │
│ ig/core │ enabled │ True │ customized │ /home/azureuser/inspect_gadget.yaml │ None │
./dist/holmes/holmes ask "detect why the k8s pod client under namespace test-ns cannot resolve dns" -f /home/azureuser/llm/demo/dns_troubleshooting_instructions.md --max-steps 20 -t ~/inspect_gadget.yaml
User: detect why the k8s pod client under namespace test-ns cannot resolve dns
Loading file /home/azureuser/llm/demo/dns_troubleshooting_instructions.md
Running tool kubectl_find_resource: kubectl get -A --show-labels -o wide pod | grep client tools.py:125
Running tool kubectl_describe: kubectl describe pod client -n test-ns tools.py:125
Running tool kubectl_get_by_kind_in_namespace: kubectl get --show-labels -o wide pod -n kube-system tools.py:125
Running tool kubectl_get_by_kind_in_namespace: kubectl get --show-labels -o wide svc -n kube-system tools.py:125
Running tool kubectl_describe: kubectl describe pod coredns-57d886c994-8h9gt -n kube-system tools.py:125
Running tool kubectl_describe: kubectl describe pod coredns-57d886c994-vqmrz -n kube-system tools.py:125
Running tool kubectl_get_by_name: kubectl get --show-labels -o wide svc kube-dns -n kube-system tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml pod client -n test-ns tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml pod coredns-57d886c994-8h9gt -n kube-system tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml pod coredns-57d886c994-vqmrz -n kube-system tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml svc kube-dns -n kube-system tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml configmap coredns -n kube-system tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml configmap coredns-custom -n kube-system tools.py:125
Running tool kubectl_get_by_kind_in_namespace: kubectl get --show-labels -o wide networkpolicy -n test-ns tools.py:125
Running tool kubectl_get_yaml: kubectl get -o yaml networkpolicy default-deny-egress -n test-ns tools.py:125
tool_calling_llm.call - completed in 9 iterations - 27071ms performance_timing.py:41
AI: DNS resolution fails for pod client in namespace test-ns because a default-deny-egress NetworkPolicy is present. This policy blocks all outbound traffic,
including DNS (UDP/TCP port 53) to the CoreDNS service at 10.0.0.10.
Key findings:
• CoreDNS pods are healthy and running.
• CoreDNS service is correctly configured.
• Pod spec uses dnsPolicy: ClusterFirst (default, correct).
• NetworkPolicy default-deny-egress in test-ns blocks all egress by default.
To resolve, update the NetworkPolicy to allow egress to 10.0.0.10 on port 53/UDP and 53/TCP.
See the official Kubernetes DNS debugging guide for more: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ (section:
"Network policies blocking DNS").
Summary by CodeRabbit
-
New Features
- Added a CLI toolset management interface with commands to list and refresh the status of built-in and custom toolsets.
- Introduced centralized toolset management, allowing for easier merging, validation, and status caching of toolsets.
- Added support for environment variable substitution in toolset configuration files.
- Introduced structured resource instruction models for improved investigation context.
-
Bug Fixes
- Corrected a typographical error in the integration guide.
-
Refactor
- Streamlined toolset status and error handling by replacing method calls with direct attribute access.
- Centralized toolset management logic and removed redundant code.
- Improved import organization and code clarity throughout the project.
- Updated configuration loading to use Path objects and simplified tool executor creation.
-
Chores
- Updated tests and templates to align with new toolset status handling.
- Enhanced error messages and validation for toolset configuration files.
Summary by CodeRabbit
-
New Features
- Added a CLI toolset management interface with commands to list and refresh statuses of built-in and custom toolsets.
- Introduced centralized toolset management via a new ToolsetManager class for merging, validation, and status caching.
- Added support for environment variable substitution in toolset configuration files.
- Introduced structured resource instruction models to enhance investigation context.
-
Bug Fixes
- Corrected a typographical error in the integration guide.
-
Refactor
- Streamlined toolset status and error handling by replacing method calls with direct attribute access.
- Centralized toolset management logic and removed redundant legacy code.
- Improved import organization and code clarity across multiple modules.
- Updated configuration loading to use Path objects and simplified tool executor creation.
- Replaced legacy toolset loading and merging logic with the ToolsetManager class.
-
Chores
- Updated tests and templates to align with new toolset status handling and configuration formats.
- Enhanced error messages and validation for toolset configuration files.
- Removed deprecated tests related to old custom toolset loading methods.
- Added comprehensive unit tests for the ToolsetManager and toolset loading utilities.
- Added new utility for environment variable replacement in configuration values.
- Added
tabulatedependency for improved CLI output formatting.
Walkthrough
This update centralizes toolset management by introducing a ToolsetManager class, refactors configuration and toolset loading logic, and adds new CLI commands for toolset status listing and refreshing. Several import statements are reorganized, and method calls for toolset status are replaced with direct attribute access. Additional utility modules and Pydantic models are introduced.
Changes
| File(s) | Change Summary |
|---|---|
| holmes/config.py | Refactored to use ToolsetManager for toolset loading, merging, and status management; removed legacy toolset functions and methods; updated signatures and logic to use Path and new manager. |
| holmes/core/resource_instruction.py | Added new Pydantic models: ResourceInstructionDocument and ResourceInstructions for investigation context. |
| holmes/core/supabase_dal.py, holmes/core/tool_calling_llm.py, holmes/core/tools.py | Reorganized imports; removed Pydantic resource instruction models from tool_calling_llm.py; replaced toolset status method calls with attribute access; added/modified toolset types and status fields. |
| holmes/core/toolset_manager.py | Introduced new ToolsetManager class for centralized toolset management, merging, prerequisite checking, and caching. |
| holmes/main.py | Added CLI toolset group with list and refresh commands; updated config and custom toolset options; removed allowed_toolsets logic; improved CLI option defaults and help text. |
| holmes/plugins/prompts/_fetch_logs.jinja2, holmes/plugins/prompts/_toolsets_instructions.jinja2 | Replaced get_status()/get_error() method calls with direct attribute access for toolset status and error. |
| holmes/plugins/toolsets/init.py | Refactored toolset YAML loading and validation; added environment variable replacement; improved error handling and validation. |
| holmes/plugins/toolsets/consts.py | Added constants for toolset configuration errors and standard parameter descriptions. |
| holmes/plugins/toolsets/coralogix/toolset_coralogix_logs.py, holmes/plugins/toolsets/grafana/base_grafana_toolset.py, holmes/plugins/toolsets/grafana/toolset_grafana_tempo.py, holmes/plugins/toolsets/kafka.py, holmes/plugins/toolsets/opensearch/opensearch.py, holmes/plugins/toolsets/prometheus/prometheus.py | Reorganized import statements; updated constant imports to use new consts.py; minor formatting changes. |
| holmes/plugins/toolsets/utils.py | Removed unused constants and reordered imports. |
| holmes/utils/default_toolset_installation_guide.jinja2 | Fixed typo in template text. |
| holmes/utils/env.py | Added new utility module for recursive environment variable replacement in configs. |
| holmes/utils/holmes_sync_toolsets.py | Replaced toolset status method calls with attribute access. |
| holmes/utils/pydantic_utils.py | Changed load_model_from_file to accept Path instead of str; minor error message formatting. |
| tests/llm/utils/mock_toolset.py, tests/plugins/toolsets/grafana/test_grafana_loki.py, tests/plugins/toolsets/grafana/test_grafana_tempo.py, tests/plugins/toolsets/test_notion.py, tests/plugins/toolsets/test_prometheus_integration.py, tests/plugins/toolsets/test_tool_kafka.py | Updated tests to use toolset status attribute instead of method; reorganized imports. |
| tests/core/test_toolset_manager.py | Added comprehensive unit tests for ToolsetManager covering loading, merging, caching, filtering, and error handling. |
| tests/plugins/toolsets/test_load_config.py | Added tests for load_toolsets_from_config validating old format rejection and environment variable substitution. |
| tests/config_class/test_config_load_custom_toolsets.py | Removed legacy tests for custom toolset loading from Config class. |
| tests/config_class/test_config_load_remote_mcp_server.py | Updated imports and calls to use new load_toolsets_from_config instead of removed legacy functions. |
| tests/test_check_prerequisites.py, tests/test_holmes_sync_toolsets.py | Replaced calls to get_error() with direct access to error attribute in assertions. |
Sequence Diagram(s)
sequenceDiagram
participant CLI
participant Config
participant ToolsetManager
participant Toolset
participant Cache
CLI->>Config: load_from_file(config_file, custom_toolsets)
Config->>ToolsetManager: initialize(toolsets, custom_toolsets)
Config->>ToolsetManager: .load_toolset_with_status()
ToolsetManager->>Cache: load cached toolset status
ToolsetManager->>Toolset: check prerequisites (if refresh needed)
Toolset->>ToolsetManager: return status, error
ToolsetManager->>Cache: update cache if needed
ToolsetManager->>Config: return list of Toolset objects with status
Config->>CLI: provide toolsets for CLI commands
CLI->>CLI: pretty_print_toolset_status(toolsets)
Assessment against linked issues
| Objective | Addressed | Explanation |
|---|---|---|
| Disable all toolsets by default and load only explicitly enabled toolsets from config (#426) | ✅ | |
| Simplify and speed up HolmesGPT startup by avoiding prerequisite checks on all toolsets (#426) | ✅ | |
| Provide CLI commands to list and refresh toolset statuses (#426) | ✅ |
Assessment against linked issues: Out-of-scope changes
| Code Change | Explanation |
|---|---|
Addition of new Pydantic models for resource instructions in holmes/core/resource_instruction.py |
Not related to toolset loading or CLI changes; unrelated to linked issue objectives. |
Typo fix in holmes/utils/default_toolset_installation_guide.jinja2 |
Minor documentation fix, unrelated to linked issue objectives. |
| Removal of legacy test modules for custom toolset loading | Testing cleanup related to refactor, not explicitly stated in linked issues. |
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Explain this complex logic.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai explain this code block.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read src/utils.ts and explain its main purpose.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai generate docstringsto generate docstrings for this PR.@coderabbitai generate sequence diagramto generate a sequence diagram of the changes in this PR.@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
Had call with @moshemorad, the notes from the call
print toolset is load from cache during ask command
fix the toolset table issue
update the custom-toolset flag description to the toolset is for experimental and not made permanent
add doc to explain the customer-facing changes.
Could you please take a look at this precommit check failure? where should I add this dependency tabulate? @@moshemorad Thanks in advance.
I have updated the pyproject.toml to include the package.
how should I deal with this build error? feat: cache toolset status and add toolset management tool command · robusta-dev/holmesgpt@b505f81
holmes/main.py:5: error: Library stubs not installed for "tabulate" [import-untyped]
[161](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:166)
holmes/main.py:5: note: Hint: "python3 -m pip install types-tabulate"
[162](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:167)
holmes/main.py:5: note: (or run "mypy --install-types" to install all missing stub packages)
[163](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:168)
holmes/main.py:5: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
[164](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:169)
Found 1 error in 1 file (checked 85 source files)
I have resolved code conflicts multiple times since my change touch the config.py. I'll focus on the PR comments if there are incoming ones, and until all looks good, I'll resolve the conflicts. cc @moshemorad
Could you please take a look at this precommit check failure? where should I add this dependency tabulate? @@moshemorad Thanks in advance.
I have updated the pyproject.toml to include the package.
how should I deal with this build error? feat: cache toolset status and add toolset management tool command · robusta-dev/holmesgpt@b505f81
holmes/main.py:5: error: Library stubs not installed for "tabulate" [import-untyped] [161](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:166) holmes/main.py:5: note: Hint: "python3 -m pip install types-tabulate" [162](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:167) holmes/main.py:5: note: (or run "mypy --install-types" to install all missing stub packages) [163](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:168) holmes/main.py:5: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports [164](https://github.com/robusta-dev/holmesgpt/actions/runs/15464595950/job/43533650601?pr=459#step:4:169) Found 1 error in 1 file (checked 85 source files)
Currently add # type: ignore
I have resolved code conflicts multiple times since my change touch the config.py. I'll focus on the PR comments if there are incoming ones, and until all looks good, I'll resolve the conflicts. cc @moshemorad
The fix is due to the missed error when we changed private attr in the Toolset Model. So it might raise more issue. Let me know if you bump into anything that need my help.
Let me know if you bump into anything that need my help.
This PR is ready for review. Please take another look. Thanks.
Please ignore the conflicts lets focus on the PR change first. I will definitely resolve the conflicts after addressing all comments to reduce the chance of resolving conflicts.
@moshemorad could you please help find the root cause of llm evaluation error. The following error message raised from the pipeline seems to be common issue? Thanks. `httpx.LocalProtocolError: Illegal header value b'Bearer '
@moshemorad could you please help find the root cause of llm evaluation error. The following error message raised from the pipeline seems to be common issue? Thanks. `httpx.LocalProtocolError: Illegal header value b'Bearer '
Looks like for some reason it isn't adding the OpenAI key, trying to understand why.
@mainred Looks like the issue is due to forked repos don't use the original repo secrets. https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions#using-secrets-in-a-workflow
@arikalon1 @nherment any idea what we want to do in those cases?