holmesgpt icon indicating copy to clipboard operation
holmesgpt copied to clipboard

`holmes ask` does not follow strictly the tsg provided as prompt

Open mainred opened this issue 6 months ago • 2 comments

What happened?

I created TSG prompt as expect the holmes ask to follow the TSG to debug the DNS resolution issue in my cluster by

holmes ask "detect why the k8s pod client under namespace test-ns cannot resolve dns" -f /home/azureuser/llm/demo/dns_troubleshooting_instructions.md


......

Running tool kubectl_get_by_kind_in_namespace: kubectl get --show-labels -o wide pods -n kube-system                                         tools.py:143
Running tool kubectl_get_by_kind_in_namespace: kubectl get --show-labels -o wide svc -n kube-system                                          tools.py:143
Running tool trace_dns_gadget: fetch DNS queries and responses of pod test-ns/client for 30 seconds                                          tools.py:143
 tool_calling_llm.call - completed in 2 iterations - 83741ms                                                                      performance_timing.py:41
AI: DNS resolution from pod client in namespace test-ns is failing due to queries not receiving responses from the DNS service (kube-dns) at ClusterIP   
10.0.0.10. The DNS trace shows repeated outgoing queries for nginx-service.test-ns.test-ns.svc.cluster.local. without any responses.

Key findings:

 • CoreDNS pods (coredns-57d886c994-8h9gt, coredns-57d886c994-vqmrz) are running normally in the kube-system namespace.
 • DNS service (kube-dns) is correctly defined with ClusterIP 10.0.0.10 and ports 53/UDP and 53/TCP.
 • DNS queries from the pod client are sent correctly to the DNS service IP but no responses are received.

Possible causes:

 1 NetworkPolicy blocking DNS traffic to/from the DNS pods.
 2 Misconfiguration in CoreDNS causing it not to respond.

Next steps:

 • Check NetworkPolicies in the cluster to ensure DNS traffic (port 53 UDP/TCP) is allowed.
 • Review CoreDNS configuration for potential misconfigurations.

Refer to the official Kubernetes DNS debugging guide for detailed troubleshooting:

 • Main guide: Kubernetes DNS Debugging
 • CoreDNS specific: CoreDNS Customization

the command stops after several iterations before finding the root case which can be detected if following the TSG.

Besides the TSG as prompt, I also created a toolset inspect gadget to collect DNS trace

What did you expect to happen?

holmes ask does not follow the TSG to detect the asked error and report how to fix the issue

How can we reproduce it (as minimally and precisely as possible)?

I tried the command on gpt-40 and gpt-4.5, both failed as described earlier.

holmes ask "detect why the k8s pod client under namespace test-ns cannot resolve dns" -f /home/azureuser/llm/demo/dns_troubleshooting_instructions.md

Anything else we need to know?

No response

mainred avatar May 21 '25 13:05 mainred

Hi @mainred, thank you very much for reporting. We're looking into it! Is there an easy way to reproduce the DNS error on my own cluster to test this?

aantn avatar May 21 '25 14:05 aantn

basically I create a network policy to block the dns request from the client pod. the specs can be found here.

cat block-dns.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
  namespace: test-ns
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress: []
---
apiVersion: v1
kind: Pod
metadata:
  name: client
  namespace: test-ns
  labels:
    app.kubernetes.io/name: client
spec:
  containers:
  - image: mainred/client:v2
    imagePullPolicy: Always
    name: test-client
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: test-ns
  labels:
    app.kubernetes.io/name: server
spec:
  containers:
  - name: nginx
    image: nginx:stable
    ports:
      - containerPort: 80
        name: http-web-svc

---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: test-ns
spec:
  selector:
    app.kubernetes.io/name: server
  ports:
  - name: name-of-service-port
    protocol: TCP
    port: 80
    targetPort: http-web-svc

mainred avatar May 21 '25 14:05 mainred