holmesgpt Improve the algorithm for truncating tools

Improve the algorithm for truncating tools

Open aantn opened this issue 6 months ago • 2 comments

Holmes currently truncates too aggressively (code here) by trying to be ‘fair’ and give each tool the same output budget, even if some tools don’t need it. This means that when tools can be too aggressively truncated, leaving unused token budget that ends up being used by no tools.

May 22 '25 18:05 aantn

One idea that came up from our team:

Can we have Holmes call the LLM (perhaps a cheaper model) to analyze/summarize the larger outputs, e.g. intelligently truncating the outputs as a first step? Then append the summarized data to the user prompt + runbook to produce the response.

May 27 '25 17:05 julia-yin

As discussed today:

For tool calls with large outputs (> x tokens), do a first pass of summarization by calling a smaller LLM model
Data summarization on a per-tool basis, certain tools can enable it when the output size is larger than some threshold
Specify which LLM model to use for summarization (e.g. gpt-4o mini)
Append the summarized data to the final LLM call to generate the diagnosis

The main tradeoff with summarizing data beforehand is latency.

May 28 '25 21:05 julia-yin

holmesgpt holmesgpt copied to clipboard

Improve the algorithm for truncating tools

holmesgpt
holmesgpt copied to clipboard