holmesgpt icon indicating copy to clipboard operation
holmesgpt copied to clipboard

Improve the algorithm for truncating tools

Open aantn opened this issue 6 months ago • 2 comments

Holmes currently truncates too aggressively (code here) by trying to be ‘fair’ and give each tool the same output budget, even if some tools don’t need it. This means that when tools can be too aggressively truncated, leaving unused token budget that ends up being used by no tools.

aantn avatar May 22 '25 18:05 aantn

One idea that came up from our team:

Can we have Holmes call the LLM (perhaps a cheaper model) to analyze/summarize the larger outputs, e.g. intelligently truncating the outputs as a first step? Then append the summarized data to the user prompt + runbook to produce the response.

julia-yin avatar May 27 '25 17:05 julia-yin

As discussed today:

  • For tool calls with large outputs (> x tokens), do a first pass of summarization by calling a smaller LLM model
  • Data summarization on a per-tool basis, certain tools can enable it when the output size is larger than some threshold
  • Specify which LLM model to use for summarization (e.g. gpt-4o mini)
  • Append the summarized data to the final LLM call to generate the diagnosis

The main tradeoff with summarizing data beforehand is latency.

julia-yin avatar May 28 '25 21:05 julia-yin