kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

Node selection for fully qualified node-names fails (--node=ip-xx-xx-xx-xx.myzone.com)

Open diranged opened this issue 1 year ago • 3 comments

What happened:

I’m trying to use the kube-state-metrics pods in the DaemonSet mode with --resources=pods and --node=$(NODE_NAME)… in my local testing on a Kind environment, it worked fine. However when I run it in a real EKS cluster to test, I get an odd behavior. We see the fieldSelector get created with the node-name … but it’s missing the .'s:

eg:

│   containers:                                                                                                                                                                                                                             │   - args:                                                                                                                                                                                                                                
│     - -v=7                                                                                                                                                                                                                               
│     - --resources=pods                                                                                                                                                                                                                    
│     - --node="$(NODE_NAME)"                                                                                                                                                                                                               
│     - --port=8080                                                                                                                                                                                                                         
│     env:                                                                                                                                                                                                                                  
│     - name: NODE_NAME                                                                                                                                                                                                                     
│       valueFrom:                                                                                                                                                                                                                          
│         fieldRef:                                                                                                                                                                                                                         
│           apiVersion: v1                                                                                                                                                                                                                  
│           fieldPath: spec.nodeName                                                                                                                                                                                                        

and then we see this:

│ I0417 20:56:30.604141       1 server.go:339] "Started kube-state-metrics self metrics server" telemetryAddress=":8081"                                                                                                                    
│ I0417 20:56:30.604284       1 builder.go:520] "FieldSelector is used" fieldSelector="spec.nodeName=ip-100-80-189-206us-west-2computeinternal"                                                                                             
│ I0417 20:56:30.604321       1 builder.go:282] "Active resources" activeStoreNames="pods"                                                                                                                                                  
│ I0417 20:56:30.604332       1 reflector.go:289] Starting reflector *v1.Pod (0s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229                                                                                        
│ I0417 20:56:30.604342       1 reflector.go:325] Listing and watching *v1.Pod from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229                                                                                           
│ I0417 20:56:30.604381       1 server.go:73] levelinfomsgListening onaddress:8080                                                                                                                                                          
│ I0417 20:56:30.604414       1 server.go:73] levelinfomsgTLS is disabled.http2falseaddress:8080                                                                                                                                            
│ I0417 20:56:30.604419       1 server.go:73] levelinfomsgListening onaddress:8081                                                                                                                                                          
│ I0417 20:56:30.604429       1 server.go:73] levelinfomsgTLS is disabled.http2falseaddress:8081                                                                                                                                            
│ I0417 20:56:30.604442       1 round_trippers.go:463] GET https://172.20.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-100-80-189-206us-west-2computeinternal&limit=500&resourceVersion=0                                           
│ I0417 20:56:30.604450       1 round_trippers.go:469] Request Headers:                                                                                                                                                                     │ I0417 20:56:30.604456       1 round_trippers.go:473]     Accept: application/vnd.kubernetes.protobuf,application/json                                                                                                                    

We can verify that we are passing ip-100-80-189-206.us-west-2.compute.internal into the CLI arg properly:

[root@admin]# ps -ef  | grep kube-state
65534    1343367 1343293  0 20:56 ?        00:00:00 /kube-state-metrics --port=8080 --telemetry-port=8081 -v=7 --resources=pods --node="ip-100-80-189-206.us-west-2.compute.internal" --port=8080

The reason we looked into it is because the pod is coming up - but it’s not reporting any metrics:

% curl -v localhost:8080/metrics
*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> GET /metrics HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/plain; version=0.0.4; charset=utf-8
< Date: Wed, 17 Apr 2024 21:00:28 GMT
< Content-Length: 0
< 
* Connection #0 to host localhost left intact

After digging, I found https://github.com/kubernetes/kube-state-metrics/pull/2217 which introduced a Regex Pattern that only matches hostnames, and not FQDNs at https://github.com/kubernetes/kube-state-metrics/blob/d1f04c2479c792d15e420255d5c6829fdd95766c/pkg/options/types.go#L142-L154.

What you expected to happen:

I expect that the input we pass in will be the input that is used - whether it is correct or not. I was completely thrown to see the code mutating my input, and effectively making the fieldSelector invalid.

Anything else we need to know?:

Environment:

  • kube-state-metrics version: 2.12.20
  • Kubernetes version (use kubectl version): 1.28.4
  • Cloud provider or hardware configuration: EKS
  • Other info:

diranged avatar Apr 17 '24 21:04 diranged

@CatherineF-dev put up a fix at https://github.com/kubernetes/kube-state-metrics/pull/2373 ... 🚤

diranged avatar Apr 17 '24 21:04 diranged

/triage accepted /assign @CatherineF-dev

logicalhan avatar Apr 18 '24 16:04 logicalhan

@diranged even through we have not tested v2.13.0 with a DS for this, I think we can tell from static analysis of the code that it should be fixed now. We are also unlikely to test this as the need for a DS is gone also with the fixes in v2.13.0.

... so, tl;dr, I think we can close this issue and re-open later if we see it again.

P.S. We are still keeping https://github.com/kubernetes/kube-state-metrics/issues/2372 open until we confirm that a KSM upgrade no longer cause the stale metrics issue that the DS was going to be a workaround for.

LaikaN57 avatar Jul 25 '24 21:07 LaikaN57