holmesgpt Working Holmes with KAITO integration

Fixed some tool calling issues that were breaking Holmes when using KAITO-deployed models on AKS. The implementation has been tested and verified with both fresh Holmes installations and existing KAITO deployments.

Nov 12 '25 19:11 nthevenin

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Nick Thevenin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Nov 12 '25 19:11 CLAassistant

Walkthrough

This PR introduces KAITO-specific improvements to Holmes' LLM system by adding environment-based tool choice configuration, new accuracy and formatting prompts, revised system prompt guidance, disabled CoreInvestigationToolset defaults, custom endpoint support for classifiers, and a comprehensive evaluation runner script with Braintrust integration.

Changes

Cohort / File(s)	Summary
Core Logic Modifications `holmes/core/llm.py`, `holmes/core/tool_calling_llm.py`	Tool choice handling broadened from requiring "auto" string to accepting any truthy value; environment variable `HOLMES_TOOL_CHOICE` now sources tool choice instead of hardcoded "auto", with debug logging added.
System Prompt Updates `holmes/core/prompt.py`	Added KAITO-specific anti-JSON enforcement and conciseness blocks to system prompt additions; commented out TodoWrite system reminder with KAITO patch comment.
Prompt Templates `holmes/plugins/prompts/_kaito_accuracy.jinja2`, `holmes/plugins/prompts/generic_ask.jinja2`	New KAITO accuracy template with counting/verification guidance; generic_ask template enhanced with tool-call workflows, natural language response requirements, JSON prohibition, conciseness enforcement, and numerical accuracy examples.
Toolset Configuration `holmes/plugins/toolsets/investigator/core_investigation.py`	CoreInvestigationToolset initialization changed to disable by default (`enabled=False`, `is_default=False`).
Evaluation Infrastructure `run_kaito_evals.sh`, `tests/llm/test_ask_holmes.py`, `tests/llm/utils/classifiers.py`, `tests/llm/utils/mock_toolset.py`	New bash runner script for KAITO evaluations with CLI parsing, environment orchestration, Braintrust integration; max_steps reduced from 40 to 10; classifier endpoint support via environment variables; KAITO_CONFIG_PATH override for toolset loading.
Configuration & Documentation `pyproject.toml`, `kaito_improvements.md`	File-based logging disabled in pyproject.toml; new KAITO improvements document detailing strategies for reducing hallucinations and counting errors.
Empty/Placeholder Files `Set`, `environment`, `variables`, `kaito_eval_output.log`, `model`	Created empty/blank placeholder files with no executable content.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Runner
    participant Tool as ToolCallingLLM
    participant Env as Environment
    participant LLM as LLM Client
    participant Handler as Tool Handler

    Test->>Env: Check HOLMES_TOOL_CHOICE
    Env-->>Tool: Return tool choice value (env var or "auto")
    
    activate Tool
    alt tools present and tool_choice truthy
        Tool->>LLM: call(tools=tools, tool_choice=HOLMES_TOOL_CHOICE)
        LLM->>Handler: Process tool calls
        Handler-->>LLM: Tool results
        LLM-->>Tool: LLM response with tool results
        Tool->>Tool: Log: HOLMES_TOOL_CHOICE=<value>
    else no tools or falsy tool_choice
        Tool->>LLM: call(tools=None)
        LLM-->>Tool: Direct LLM response
    end
    deactivate Tool
    
    Tool-->>Test: Final response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

run_kaito_evals.sh: High logic density with complex argument parsing, environment orchestration, and test command construction requiring careful validation of all conditional branches and environment variable handling.
generic_ask.jinja2: Significant prompt logic changes affecting LLM behavior and response structure; requires understanding of cascading effects on model outputs and tool-calling workflows.
tests/llm/utils/classifiers.py: Multiple endpoint handling paths (custom/KAITO, Azure, standard OpenAI) with new environment variable dependencies; logic across create_llm_client and evaluate_correctness requires tracing effective URL/key/model resolution.
holmes/core/tool_calling_llm.py & holmes/core/llm.py: Tool choice logic changes affecting broader system behavior; interaction between truthy check and environment variable sourcing needs validation.
Coherence across multiple modified files: Changes to prompts, tool choice, endpoints, and toolset defaults interact; holistic understanding required to validate intended behavior.

Possibly related PRs

robusta-dev/holmesgpt#563: Modifies toolset default/enabled behavior (KubernetesLogsToolset and mock toolset enabling), directly related to CoreInvestigationToolset disabled-by-default changes.
robusta-dev/holmesgpt#823: Adds AI safety partial to prompt templates and modifies system guidance blocks similar to KAITO-specific accuracy and conciseness additions in this PR.
robusta-dev/holmesgpt#729: Modifies generic_ask.jinja2 prompts and tests/llm/utils/mock_toolset.py loader in overlapping ways.

Suggested reviewers

Sheeproid
arikalon1
moshemorad

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: fixing tool calling issues for Holmes with KAITO integration and verifying the fix works.
Description check	✅ Passed	The description is directly related to the changeset, explaining the problem being fixed (tool calling issues with KAITO models) and that the solution has been tested.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Nov 12 '25 19:11 coderabbitai[bot]

moved to https://github.com/HolmesGPT/holmesgpt/pull/1186

Dec 12 '25 00:12 nthevenin

Working Holmes with KAITO integration - tool calling verified

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches