noahlwest
noahlwest
Adds find-application-level-failures eval. The goal of this eval is to cover the case of the agent troubleshooting past a healthy k8s environment, but some application-level failures are apparent in the...
I noticed this while trying to create the eval in #295, it happens quite often when trying to list data from kubectl logs. ran with `./cmd --v 5 --enable-tool-use-shim` ```>>>...
generate_benchmark.py : generates the pages to display k8s-ai-bench data combined_results.jsonl : the benchmark data that is used while generating the site about.html : a quick about page to describe k8s-ai-bench...
**Environment:** - OS: Debian testing - kubectl-ai version: 0.0.26 - LLM provider: none - LLM model: none - Cluster type: KinD (version 0.11.1) **Describe the bug** If you try to...
#443 added session persistence for Gemini, but other providers were left unimplemented. We should add similar implementation of Initialize() to load chat history for AWS Bedrock. This can happen in...
#443 added session persistence for Gemini, and we should have tests for it. Basic flows like saving and loading sessions, creating a new session, deleting a session, and listing sessions...
#443 added session persistence for Gemini, but other providers were left unimplemented. We should add similar implementation of Initialize() to load chat history for Azure OpenAI. This can happen in...
#443 added session persistence for Gemini, but other providers were left unimplemented. We should add similar implementation of Initialize() to load chat history for Grok. This can happen in gollm/grok.go...
#443 added session persistence for Gemini, but other providers were left unimplemented. We should add similar implementation of Initialize() to load chat history for OpenAI provider. This can happen in...
I was experimenting with Z.ai's GLM 4.6 model served with openai-compatible endpoint from vllm, and noticed that it gives output in a slightly different format from other models. This change...