noahlwest

Results 15 issues of noahlwest

Adds find-application-level-failures eval. The goal of this eval is to cover the case of the agent troubleshooting past a healthy k8s environment, but some application-level failures are apparent in the...

I noticed this while trying to create the eval in #295, it happens quite often when trying to list data from kubectl logs. ran with `./cmd --v 5 --enable-tool-use-shim` ```>>>...

generate_benchmark.py : generates the pages to display k8s-ai-bench data combined_results.jsonl : the benchmark data that is used while generating the site about.html : a quick about page to describe k8s-ai-bench...

**Environment:** - OS: Debian testing - kubectl-ai version: 0.0.26 - LLM provider: none - LLM model: none - Cluster type: KinD (version 0.11.1) **Describe the bug** If you try to...

bug

#443 added session persistence for Gemini, but other providers were left unimplemented. We should add similar implementation of Initialize() to load chat history for AWS Bedrock. This can happen in...

enhancement
help wanted

#443 added session persistence for Gemini, and we should have tests for it. Basic flows like saving and loading sessions, creating a new session, deleting a session, and listing sessions...

enhancement

#443 added session persistence for Gemini, but other providers were left unimplemented. We should add similar implementation of Initialize() to load chat history for Azure OpenAI. This can happen in...

enhancement
help wanted

#443 added session persistence for Gemini, but other providers were left unimplemented. We should add similar implementation of Initialize() to load chat history for Grok. This can happen in gollm/grok.go...

enhancement
help wanted

#443 added session persistence for Gemini, but other providers were left unimplemented. We should add similar implementation of Initialize() to load chat history for OpenAI provider. This can happen in...

enhancement
help wanted

I was experimenting with Z.ai's GLM 4.6 model served with openai-compatible endpoint from vllm, and noticed that it gives output in a slightly different format from other models. This change...