kubectl-ai icon indicating copy to clipboard operation
kubectl-ai copied to clipboard

[Bug]: fix-oomkilled eval can pass without llm connection (setup does not cause oomkilled)

Open noahlwest opened this issue 3 months ago • 1 comments

Environment:

  • OS: Debian testing
  • kubectl-ai version: 0.0.26
  • LLM provider: none
  • LLM model: none
  • Cluster type: KinD (version 0.11.1)

Describe the bug If you try to run fix-oomkilled eval without a valid llm connection (for example: openai provider, gemini-2.5-pro model, while pointing to a non-existent openai endpoint), it can still succeed.

To Reproduce Steps to reproduce the behavior:

  1. Run fix-oomkilled eval with invalid llm. example: ./dev/ci/periodics/run-eval-loop.sh -i 1 -p openai -m gemini-2.5-flash -a http://localhost:8000/v1 -c 5 -t "fix-oomkilled"
  2. See llm failures but also passing eval

Expected behavior Eval should fail when there is no llm connection

Additional context It looks like applying the yaml in the artifacts dir for this eval doesn't necessarily cause oomkilled, at least not on KinD cluster. I tried a few ways to get the pod to oomkill and wasn't able to get this to reliably happen. I have a fix in #572 that will make the setup script fail if it doesn't see oomkill events, but that doesn't fix this problem of being unable to create oomkill. I was able to force it with a small python program continuously allocating memory, but it just crashes the pod with 'Error' and 137 code, not oomkilled. We need a way to cause oomkilled but also be something the agent can reasonably fix.

noahlwest avatar Oct 15 '25 15:10 noahlwest

@prasad89

droot avatar Oct 16 '25 02:10 droot