kubectl-ai error when fetching logs

Running: kubectl logs kong-gateway-54cff884d5-jd8pt -n omnimizer-system

Error: reading streaming LLM response: Error 400, Message: The input token count (2002605) exceeds the maximum number of tokens allowed (1048576)., Status: INVALID_ARGUMENT, Details: []

May 18 '25 10:05 KataaOmni

This is due to your input size, the task perhaps was too big for the model you are using, and it tried outputting too many tokens. not an error of the repo itself.

try reducing the input size, I lack background so this depends directly on your llm and model providers here is a reading about what tokens are and how they work

May 18 '25 14:05 zvdy

From the error it is clear that the token limit was 1M and request size was around 2M.

Not sure what was the exact task you had in mind. You can try asking for most recent 500 lines of logs if that helps.

@zvdy even though this is not an error on the repo itself, I do think context size management is an area where we haven't done much work and I am sure there are some low hanging fruits to improve the overall user experience.

May 18 '25 15:05 droot

From the error it is clear that the token limit was 1M and request size was around 2M.

Not sure what was the exact task you had in mind. You can try asking for most recent 500 lines of logs if that helps.

@zvdy even though this is not an error on the repo itself, I do think context size management is an area where we haven't done much work and I am sure there are some low hanging fruits to improve the overall user experience.

Agree on this, we could fetch model token limits and guardrail the response/request ctx based on it, is this what you had in mind?

May 18 '25 17:05 zvdy

Agree on this, we could fetch model token limits and guardrail the response/request ctx based on it, is this what you had in mind?

I think this area require little bit of research and tinkering. We can take a look at other agents to learn the tricks in this space and also invent our own. I know claude code offers /compact command for compacting the chat conversation, which could be applicable here as well.

Many ideas possible here:

Offer /compact like functionality to manage context without losing too much info.
tweak the prompt to be less verbose and be direct without losing accuracy
Detecting the context limit reached error and do the compacting proactively and rety (may be ask human to run compact and retry).
Proactively detecting if we are going to hit the context limit and take action.

Overall, I think context engineering is going to be a long running challenge for us and no single trick is going to solve all the problems. So understanding the current state employed by state of the art agents will be first thing and bringing those methods to kubectl-ai.

May 19 '25 07:05 droot

Agree on this, we could fetch model token limits and guardrail the response/request ctx based on it, is this what you had in mind?

I think this area require little bit of research and tinkering. We can take a look at other agents to learn the tricks in this space and also invent our own. I know claude code offers /compact command for compacting the chat conversation, which could be applicable here as well.

Many ideas possible here:

Offer /compact like functionality to manage context without losing too much info.

Detecting the context limit reached error and do the compacting proactively and rety (may be ask human to run compact and retry).

Proactively detecting if we are going to hit the context limit and take action.

Overall, I think context engineering is going to be a long running challenge for us and no single trick is going to solve all the problems. So understanding the current state employed by state of the art agents will be first thing and bringing those methods to kubectl-ai.

Thanks @droot @zvdy I was looking for something like this

May 19 '25 07:05 KataaOmni