error when fetching logs
Running: kubectl logs kong-gateway-54cff884d5-jd8pt -n omnimizer-system
Error: reading streaming LLM response: Error 400, Message: The input token count (2002605) exceeds the maximum number of tokens allowed (1048576)., Status: INVALID_ARGUMENT, Details: []
This is due to your input size, the task perhaps was too big for the model you are using, and it tried outputting too many tokens. not an error of the repo itself.
try reducing the input size, I lack background so this depends directly on your llm and model providers here is a reading about what tokens are and how they work
From the error it is clear that the token limit was 1M and request size was around 2M.
Not sure what was the exact task you had in mind. You can try asking for most recent 500 lines of logs if that helps.
@zvdy even though this is not an error on the repo itself, I do think context size management is an area where we haven't done much work and I am sure there are some low hanging fruits to improve the overall user experience.
From the error it is clear that the token limit was 1M and request size was around 2M.
Not sure what was the exact task you had in mind. You can try asking for most recent 500 lines of logs if that helps.
@zvdy even though this is not an error on the repo itself, I do think context size management is an area where we haven't done much work and I am sure there are some low hanging fruits to improve the overall user experience.
Agree on this, we could fetch model token limits and guardrail the response/request ctx based on it, is this what you had in mind?
Agree on this, we could fetch model token limits and guardrail the response/request ctx based on it, is this what you had in mind?
I think this area require little bit of research and tinkering. We can take a look at other agents to learn the tricks in this space and also invent our own. I know claude code offers /compact command for compacting the chat conversation, which could be applicable here as well.
Many ideas possible here:
- Offer
/compactlike functionality to manage context without losing too much info. - tweak the prompt to be less verbose and be direct without losing accuracy
- Detecting the context limit reached error and do the compacting proactively and rety (may be ask human to run compact and retry).
- Proactively detecting if we are going to hit the context limit and take action.
Overall, I think context engineering is going to be a long running challenge for us and no single trick is going to solve all the problems. So understanding the current state employed by state of the art agents will be first thing and bringing those methods to kubectl-ai.
Agree on this, we could fetch model token limits and guardrail the response/request ctx based on it, is this what you had in mind?
I think this area require little bit of research and tinkering. We can take a look at other agents to learn the tricks in this space and also invent our own. I know
claude codeoffers/compactcommand for compacting the chat conversation, which could be applicable here as well.Many ideas possible here:
- Offer
/compactlike functionality to manage context without losing too much info.- Detecting the context limit reached error and do the compacting proactively and rety (may be ask human to run compact and retry).
- Proactively detecting if we are going to hit the context limit and take action.
Overall, I think context engineering is going to be a long running challenge for us and no single trick is going to solve all the problems. So understanding the current state employed by state of the art agents will be first thing and bringing those methods to
kubectl-ai.
Thanks @droot @zvdy I was looking for something like this