buildkit
buildkit copied to clipboard
podmetrics.metrics.k8s.io from buildkit not found
We're having some issues with buildkit pod metrics, roughly ever since k8s 1.25 -> 1.26 upgrade (tho I'm not 100% sure if this is just coincidence). Basically both our datadog agents & metrics-server are having trouble getting pod metrics from buildkit (running v0.12.3).
Output from kubectl top pod:
➜ k top pod buildkit-deployment-575959cf77-rb94w --v=10 -n buildkit
I1108 12:19:10.699666 12557 round_trippers.go:466] curl -v -XGET -H "Accept: application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList,application/json" -H "User-Agent: kubectl/v1.28.3 (darwin/arm64) kubernetes/a8a1abc" 'https://6F753C08CB5B073408D87E9B6A225BB4.yl4.eu-north-1.eks.amazonaws.com/api'
I1108 12:19:11.455471 12557 round_trippers.go:495] HTTP Trace: DNS Lookup for 6F753C08CB5B073408D87E9B6A225BB4.yl4.eu-north-1.eks.amazonaws.com resolved to [{13.48.241.241 } {13.48.231.68 }]
I1108 12:19:11.467312 12557 round_trippers.go:510] HTTP Trace: Dial to tcp:13.48.241.241:443 succeed
I1108 12:19:11.531419 12557 round_trippers.go:553] GET https://6F753C08CB5B073408D87E9B6A225BB4.yl4.eu-north-1.eks.amazonaws.com/api 200 OK in 831 milliseconds
I1108 12:19:11.531434 12557 round_trippers.go:570] HTTP Statistics: DNSLookup 3 ms Dial 11 ms TLSHandshake 27 ms ServerProcessing 36 ms Duration 831 ms
I1108 12:19:11.531439 12557 round_trippers.go:577] Response Headers:
I1108 12:19:11.531444 12557 round_trippers.go:580] Cache-Control: no-cache, private
I1108 12:19:11.531448 12557 round_trippers.go:580] Content-Type: application/json
I1108 12:19:11.531453 12557 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: dbb9ff33-f0ad-4827-ae96-c2bbc640e12b
I1108 12:19:11.531456 12557 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: e2207259-ae05-4d7f-9139-813d664e3a84
I1108 12:19:11.531460 12557 round_trippers.go:580] Content-Length: 167
I1108 12:19:11.531465 12557 round_trippers.go:580] Date: Wed, 08 Nov 2023 10:19:11 GMT
I1108 12:19:11.531468 12557 round_trippers.go:580] Audit-Id: fc4066db-894a-4c08-91b0-ebb3ab3b668a
I1108 12:19:11.531483 12557 request.go:1212] Response Body: {"kind":"APIVersions","versions":["v1"],"serverAddressByClientCIDRs":[{"clientCIDR":"0.0.0.0/0","serverAddress":"ip-172-16-110-156.eu-north-1.compute.internal:443"}]}
I1108 12:19:11.531668 12557 round_trippers.go:466] curl -v -XGET -H "Accept: application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList,application/json" -H "User-Agent: kubectl/v1.28.3 (darwin/arm64) kubernetes/a8a1abc" 'https://6F753C08CB5B073408D87E9B6A225BB4.yl4.eu-north-1.eks.amazonaws.com/apis'
I1108 12:19:11.572713 12557 round_trippers.go:553] GET https://6F753C08CB5B073408D87E9B6A225BB4.yl4.eu-north-1.eks.amazonaws.com/apis 200 OK in 40 milliseconds
I1108 12:19:11.572752 12557 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 40 ms Duration 40 ms
I1108 12:19:11.572757 12557 round_trippers.go:577] Response Headers:
I1108 12:19:11.572763 12557 round_trippers.go:580] Audit-Id: 37e7d8c3-3b56-4014-ba00-ec2fd98b77a7
I1108 12:19:11.572768 12557 round_trippers.go:580] Cache-Control: no-cache, private
I1108 12:19:11.572773 12557 round_trippers.go:580] Content-Type: application/json
I1108 12:19:11.572777 12557 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: dbb9ff33-f0ad-4827-ae96-c2bbc640e12b
I1108 12:19:11.572781 12557 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: e2207259-ae05-4d7f-9139-813d664e3a84
I1108 12:19:11.572785 12557 round_trippers.go:580] Date: Wed, 08 Nov 2023 10:19:11 GMT
I1108 12:19:11.572922 12557 request.go:1212] Response Body: {"kind":"APIGroupList","apiVersion":"v1","groups":[{"name":"apiregistration.k8s.io","versions":[{"groupVersion":"apiregistration.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"apiregistration.k8s.io/v1","version":"v1"}},{"name":"apps","versions":[{"groupVersion":"apps/v1","version":"v1"}],"preferredVersion":{"groupVersion":"apps/v1","version":"v1"}},{"name":"events.k8s.io","versions":[{"groupVersion":"events.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"events.k8s.io/v1","version":"v1"}},{"name":"authentication.k8s.io","versions":[{"groupVersion":"authentication.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"authentication.k8s.io/v1","version":"v1"}},{"name":"authorization.k8s.io","versions":[{"groupVersion":"authorization.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"authorization.k8s.io/v1","version":"v1"}},{"name":"autoscaling","versions":[{"groupVersion":"autoscaling/v2","version":"v2"},{"groupVersion":"autoscaling/v1","version":"v1"}],"preferredVersion":{"groupVersion":"autoscaling/v2","version":"v2"}},{"name":"batch","versions":[{"groupVersion":"batch/v1","version":"v1"}],"preferredVersion":{"groupVersion":"batch/v1","version":"v1"}},{"name":"certificates.k8s.io","versions":[{"groupVersion":"certificates.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"certificates.k8s.io/v1","version":"v1"}},{"name":"networking.k8s.io","versions":[{"groupVersion":"networking.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"networking.k8s.io/v1","version":"v1"}},{"name":"policy","versions":[{"groupVersion":"policy/v1","version":"v1"}],"preferredVersion":{"groupVersion":"policy/v1","version":"v1"}},{"name":"rbac.authorization.k8s.io","versions":[{"groupVersion":"rbac.authorization.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"rbac.authorization.k8s.io/v1","version":"v1"}},{"name":"storage.k8s.io","versions":[{"groupVersion":"storage.k8s.io/v1","version":"v1"},{"groupVersion":"storage.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"storage.k8s.io/v1","version":"v1"}},{"name":"admissionregistration.k8s.io","versions":[{"groupVersion":"admissionregistration.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"admissionregistration.k8s.io/v1","version":"v1"}},{"name":"apiextensions.k8s.io","versions":[{"groupVersion":"apiextensions.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"apiextensions.k8s.io/v1","version":"v1"}},{"name":"scheduling.k8s.io","versions":[{"groupVersion":"scheduling.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"scheduling.k8s.io/v1","version":"v1"}},{"name":"coordination.k8s.io","versions":[{"groupVersion":"coordination.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"coordination.k8s.io/v1","version":"v1"}},{"name":"node.k8s.io","versions":[{"groupVersion":"node.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"node.k8s.io/v1","version":"v1"}},{"name":"discovery.k8s.io","versions":[{"groupVersion":"discovery.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"discovery.k8s.io/v1","version":"v1"}},{"name":"flowcontrol.apiserver.k8s.io","versions":[{"groupVersion":"flowcontrol.apiserver.k8s.io/v1beta3","version":"v1beta3"},{"groupVersion":"flowcontrol.apiserver.k8s.io/v1beta2","version":"v1beta2"}],"preferredVersion":{"groupVersion":"flowcontrol.apiserver.k8s.io/v1beta3","version":"v1beta3"}},{"name":"getambassador.io","versions":[{"groupVersion":"getambassador.io/v2","version":"v2"},{"groupVersion":"getambassador.io/v1","version":"v1"},{"groupVersion":"getambassador.io/v1beta2","version":"v1beta2"},{"groupVersion":"getambassador.io/v1beta1","version":"v1beta1"},{"groupVersion":"getambassador.io/v3alpha1","version":"v3alpha1"}],"preferredVersion":{"groupVersion":"getambassador.io/v2","version":"v2"}},{"name":"kyverno.io","versions":[{"groupVersion":"kyverno.io/v1","version":"v1"},{"groupVersion":"kyverno.io/v2beta1","version":"v2beta1"},{"groupVersion":"kyverno.io/v1beta1","version":"v1beta1"},{"groupVersion":"kyverno.io/v2alpha1","version":"v2alpha1"},{"groupVersion":"kyverno.io/v1alpha2","version":"v1alpha2"}],"preferredVersion":{"groupVersion":"kyverno.io/v1","version":"v1"}},{"name":"argoproj.io","versions":[{"groupVersion":"argoproj.io/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"argoproj.io/v1alpha1","version":"v1alpha1"}},{"name":"crd.k8s.amazonaws.com","versions":[{"groupVersion":"crd.k8s.amazonaws.com/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"crd.k8s.amazonaws.com/v1alpha1","version":"v1alpha1"}},{"name":"datadoghq.com","versions":[{"groupVersion":"datadoghq.com/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"datadoghq.com/v1alpha1","version":"v1alpha1"}},{"name":"dynatrace.com","versions":[{"groupVersion":"dynatrace.com/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"dynatrace.com/v1alpha1","version":"v1alpha1"}},{"name":"external-secrets.io","versions":[{"groupVersion":"external-secrets.io/v1beta1","version":"v1beta1"},{"groupVersion":"external-secrets.io/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"external-secrets.io/v1beta1","version":"v1beta1"}},{"name":"generators.external-secrets.io","versions":[{"groupVersion":"generators.external-secrets.io/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"generators.external-secrets.io/v1alpha1","version":"v1alpha1"}},{"name":"karpenter.k8s.aws","versions":[{"groupVersion":"karpenter.k8s.aws/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"karpenter.k8s.aws/v1alpha1","version":"v1alpha1"}},{"name":"networking.k8s.aws","versions":[{"groupVersion":"networking.k8s.aws/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"networking.k8s.aws/v1alpha1","version":"v1alpha1"}},{"name":"traefik.containo.us","versions":[{"groupVersion":"traefik.containo.us/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"traefik.containo.us/v1alpha1","version":"v1alpha1"}},{"name":"traefik.io","versions":[{"groupVersion":"traefik.io/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"traefik.io/v1alpha1","version":"v1alpha1"}},{"name":"vpcresources.k8s.aws","versions":[{"groupVersion":"vpcresources.k8s.aws/v1beta1","version":"v1beta1"},{"groupVersion":"vpcresources.k8s.aws/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"vpcresources.k8s.aws/v1beta1","version":"v1beta1"}},{"name":"wgpolicyk8s.io","versions":[{"groupVersion":"wgpolicyk8s.io/v1alpha2","version":"v1alpha2"}],"preferredVersion":{"groupVersion":"wgpolicyk8s.io/v1alpha2","version":"v1alpha2"}},{"name":"karpenter.sh","versions":[{"groupVersion":"karpenter.sh/v1alpha5","version":"v1alpha5"}],"preferredVersion":{"groupVersion":"karpenter.sh/v1alpha5","version":"v1alpha5"}},{"name":"rbacmanager.reactiveops.io","versions":[{"groupVersion":"rbacmanager.reactiveops.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"rbacmanager.reactiveops.io/v1beta1","version":"v1beta1"}},{"name":"external.metrics.k8s.io","versions":[{"groupVersion":"external.metrics.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"external.metrics.k8s.io/v1beta1","version":"v1beta1"}},{"name":"metrics.k8s.io","versions":[{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}}]}
I1108 12:19:11.573282 12557 round_trippers.go:466] curl -v -XGET -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "User-Agent: kubectl/v1.28.3 (darwin/arm64) kubernetes/a8a1abc" 'https://6F753C08CB5B073408D87E9B6A225BB4.yl4.eu-north-1.eks.amazonaws.com/apis/metrics.k8s.io/v1beta1/namespaces/buildkit/pods/buildkit-deployment-575959cf77-rb94w'
I1108 12:19:11.639674 12557 round_trippers.go:553] GET https://6F753C08CB5B073408D87E9B6A225BB4.yl4.eu-north-1.eks.amazonaws.com/apis/metrics.k8s.io/v1beta1/namespaces/buildkit/pods/buildkit-deployment-575959cf77-rb94w 404 Not Found in 66 milliseconds
I1108 12:19:11.639690 12557 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 66 ms Duration 66 ms
I1108 12:19:11.639695 12557 round_trippers.go:577] Response Headers:
I1108 12:19:11.639702 12557 round_trippers.go:580] Date: Wed, 08 Nov 2023 10:19:11 GMT
I1108 12:19:11.639708 12557 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: dbb9ff33-f0ad-4827-ae96-c2bbc640e12b
I1108 12:19:11.639714 12557 round_trippers.go:580] Content-Type: application/vnd.kubernetes.protobuf
I1108 12:19:11.639719 12557 round_trippers.go:580] Cache-Control: no-cache, private
I1108 12:19:11.639724 12557 round_trippers.go:580] Cache-Control: no-cache, private
I1108 12:19:11.639729 12557 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: e2207259-ae05-4d7f-9139-813d664e3a84
I1108 12:19:11.639734 12557 round_trippers.go:580] Content-Length: 221
I1108 12:19:11.639740 12557 round_trippers.go:580] Audit-Id: 20c5f75a-c173-4bcd-8547-9e964422f0f6
I1108 12:19:11.639745 12557 round_trippers.go:580] Audit-Id: 20c5f75a-c173-4bcd-8547-9e964422f0f6
I1108 12:19:11.639773 12557 request.go:1210] Response Body:
00000000 6b 38 73 00 0a 0c 0a 02 76 31 12 06 53 74 61 74 |k8s.....v1..Stat|
00000010 75 73 12 c4 01 0a 06 0a 00 12 00 1a 00 12 07 46 |us.............F|
00000020 61 69 6c 75 72 65 1a 53 70 6f 64 6d 65 74 72 69 |ailure.Spodmetri|
00000030 63 73 2e 6d 65 74 72 69 63 73 2e 6b 38 73 2e 69 |cs.metrics.k8s.i|
00000040 6f 20 22 62 75 69 6c 64 6b 69 74 2f 62 75 69 6c |o "buildkit/buil|
00000050 64 6b 69 74 2d 64 65 70 6c 6f 79 6d 65 6e 74 2d |dkit-deployment-|
00000060 35 37 35 39 35 39 63 66 37 37 2d 72 62 39 34 77 |575959cf77-rb94w|
00000070 22 20 6e 6f 74 20 66 6f 75 6e 64 22 08 4e 6f 74 |" not found".Not|
00000080 46 6f 75 6e 64 2a 4f 0a 2d 62 75 69 6c 64 6b 69 |Found*O.-buildki|
00000090 74 2f 62 75 69 6c 64 6b 69 74 2d 64 65 70 6c 6f |t/buildkit-deplo|
000000a0 79 6d 65 6e 74 2d 35 37 35 39 35 39 63 66 37 37 |yment-575959cf77|
000000b0 2d 72 62 39 34 77 12 0e 6d 65 74 72 69 63 73 2e |-rb94w..metrics.|
000000c0 6b 38 73 2e 69 6f 1a 0a 70 6f 64 6d 65 74 72 69 |k8s.io..podmetri|
000000d0 63 73 28 00 32 00 30 94 03 1a 00 22 00 |cs(.2.0....".|
I1108 12:19:11.639993 12557 helpers.go:246] server response object: [{
"metadata": {},
"status": "Failure",
"message": "podmetrics.metrics.k8s.io \"buildkit/buildkit-deployment-575959cf77-rb94w\" not found",
"reason": "NotFound",
"details": {
"name": "buildkit/buildkit-deployment-575959cf77-rb94w",
"group": "metrics.k8s.io",
"kind": "podmetrics"
},
"code": 404
}]
Error from server (NotFound): podmetrics.metrics.k8s.io "buildkit/buildkit-deployment-575959cf77-rb94w" not found
Not quite sure how to continue debugging this tbh. Every other pod seems to output metrics just fine, even ones on the same nodes as buildkit, so I doubt it's any kind of security group issue etc.
It looks like the errors you're getting all seem to be from kubernetes? I can't see in the output anything specific to buildkit - there's no metrics that buildkit exposes that should be interfering with this kind of thing.
If this appeared during a kubernetes upgrade, it's likely to have been something to do with that, instead of an issue internal to buildkit?
Thanks, that might very well be. The curious thing is that metrics from all other applications and components except buildkit continue working just fine.
ya, podmetrics are entirely handled by kubernetes controllers. the whole point of podmetrics is that the services running on kubernetes know nothing about them. as for why it's not working for you, the place to start would be to find out what controller you're using for collecting podmetrics (likely metrics-server
) and then checking the logs of that controller
I'm going to close this issue then I think, since it's confirmed not to be a buildkit-specific issue (thanks @nicks!).
@bcha if you find any more details that make it clear that it is actually a buildkit issue, then we can re-open :tada:
@jedevc Yeah so I spent some more time debugging this.
On Bottlerocket nodes when I downgraded to buildkit 0.11.6 the metrics started working fine. Should be easily reproducable. The image tag is the only difference between these two examples:
buildkit v0.11.6 on bottlerocket:
➜ k top pod
NAME CPU(cores) MEMORY(bytes)
buildkit-helmeded-buildkit-service-7b6cdcddb5-mg5dm 3m 10Mi
buildkit v0.12.0 on bottlerocket:
➜ k top pod
error: Metrics not available for pod buildkit-helmeded/buildkit-helmeded-buildkit-service-5bdf4d9664-cpwxg, age: 3m48.164019s
On regular amazon linux nodes buildkit >=0.12.0 works fine, so this seems to be some combination of issues between buildkit, k8s >=1.26 & bottlerocket security hardenings.
I cant seem to find anything relevant in buildkit logs.
Weiiird. Any chance you have a pod spec you could share?
I think it's worth re-opening then, since in your example it looks like you're just changing the buildkit version, and nothing else and then seeing the issue.
I wonder if this could be related to the cgroupsv2 related things we worked on for v0.12, specifically https://github.com/moby/buildkit/pull/4003 or https://github.com/moby/buildkit/pull/3860 (cc @tonistiigi @AkihiroSuda).
Yeah, tell me about it 😁 I suspected the cgroupsv2 a bit myself too earlier, but it was just a hunch & didnt look into it.
Of course, here are pod specs:
v0.12.3:
apiVersion: v1
items:
- apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2023-11-13T10:21:55Z"
generateName: buildkit-helmeded-buildkit-service-886cb8656-
labels:
app.kubernetes.io/instance: buildkit-helmeded
app.kubernetes.io/name: buildkit-service
pod-template-hash: 886cb8656
name: buildkit-helmeded-buildkit-service-886cb8656-dlfr4
namespace: buildkit-helmeded
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: buildkit-helmeded-buildkit-service-886cb8656
uid: 5c31c42d-ad33-40d9-b0d8-678896b7e113
resourceVersion: "1227031494"
uid: 17d558a9-faab-4cc0-b5a6-5cf7ac55cd5f
spec:
containers:
- args:
- --addr
- unix:///run//buildkit/buildkitd.sock
- --addr
- tcp://0.0.0.0:1234
- --debug
image: moby/buildkit:v0.12.3
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
name: buildkit-service
ports:
- containerPort: 1234
name: tcp
protocol: TCP
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-58mm8
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ip-10-0-4-163.eu-north-1.compute.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-58mm8
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-11-13T10:21:55Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-11-13T10:22:25Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-11-13T10:22:25Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-11-13T10:21:55Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://be12e385be22615ce91b12565a34a8e1a663404c4c8efb35fb9de8421883758c
image: docker.io/moby/buildkit:v0.12.3
imageID: docker.io/moby/buildkit@sha256:d4187a7326f20d04fafd075f80ccc5d3f8cfd4f665c6e03d158a78e4f64bf3db
lastState: {}
name: buildkit-service
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-11-13T10:22:00Z"
hostIP: 10.0.4.163
phase: Running
podIP: 10.0.1.250
podIPs:
- ip: 10.0.1.250
qosClass: BestEffort
startTime: "2023-11-13T10:21:55Z"
kind: List
metadata:
resourceVersion: ""
v0.11.6:
apiVersion: v1
items:
- apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2023-11-13T11:15:45Z"
generateName: buildkit-helmeded-buildkit-service-5cdf6b4d78-
labels:
app.kubernetes.io/instance: buildkit-helmeded
app.kubernetes.io/name: buildkit-service
pod-template-hash: 5cdf6b4d78
name: buildkit-helmeded-buildkit-service-5cdf6b4d78-bfp5c
namespace: buildkit-helmeded
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: buildkit-helmeded-buildkit-service-5cdf6b4d78
uid: 6e72ec99-03d9-49bc-9c7b-aef59b1f8696
resourceVersion: "1227075325"
uid: caff026d-94d7-405a-8003-21d7192c39c5
spec:
containers:
- args:
- --addr
- unix:///run//buildkit/buildkitd.sock
- --addr
- tcp://0.0.0.0:1234
- --debug
image: moby/buildkit:v0.11.6
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
name: buildkit-service
ports:
- containerPort: 1234
name: tcp
protocol: TCP
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-cdd5v
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ip-10-0-16-4.eu-north-1.compute.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-cdd5v
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-11-13T11:15:45Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-11-13T11:16:16Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-11-13T11:16:16Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-11-13T11:15:45Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://89409881904dbc75ebaaa7c03519f30ec3f214c2620c4f7e3aaadfc072b602af
image: docker.io/moby/buildkit:v0.11.6
imageID: docker.io/moby/buildkit@sha256:d6fa89830c26919acba23c5cafa09df0c3ec1fbde20bb2a15ff349e0795241f4
lastState: {}
name: buildkit-service
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-11-13T11:15:51Z"
hostIP: 10.0.16.4
phase: Running
podIP: 10.0.28.251
podIPs:
- ip: 10.0.28.251
qosClass: BestEffort
startTime: "2023-11-13T11:15:45Z"
kind: List
metadata:
resourceVersion: ""
I have exactly the same problem with buildkit v0.12.3 and k8 v1.27.8. All other namespaces work fine but buildkit has no pod metrics.
apiVersion: v1
kind: Pod
metadata:
generateName: buildkit-amd64-57fcbc8c94-
labels:
app: buildkitd
pod-template-hash: 57fcbc8c94
name: buildkit-amd64-57fcbc8c94-2m9ch
namespace: buildkit
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: buildkit-amd64-57fcbc8c94
uid: b6172e11-590c-4839-a7fb-eca0d708064b
resourceVersion: "56285397"
uid: 2c42dad3-31fa-45c8-a196-18bf552d604b
spec:
containers:
- args:
- --addr
- unix:///run/buildkit/buildkitd.sock
- --addr
- tcp://0.0.0.0:1234
image: docker.io/moby/buildkit:buildx-stable-1@sha256:d4187a7326f20d04fafd075f80ccc5d3f8cfd4f665c6e03d158a78e4f64bf3db
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
name: buildkitd
ports:
- containerPort: 1234
protocol: TCP
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: "6"
memory: 14Gi
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/buildkit
name: buildkit
- mountPath: /etc/buildkit
name: config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-58vlt
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: nodes-fsn1-6a1f21b83b0e8a35
preemptionPolicy: Never
priority: 1
priorityClassName: normal
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: buildkit-amd64
name: config
- emptyDir: {}
name: buildkit
- name: kube-api-access-58vlt
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
update: applying the status/needs-investigation
tag until the exact bug is identified.
Hi hopefully this get some attention, its still happening buildkit v0.13.2 and k8s v1.29.3
We ended up locking version to v0.11.6
. Now re-checked this as that pinned version is getting pretty old and has bunch of vulns.. Upgraded buildkit to latest v0.14.1
. Still getting the same metrics issue. Nowdays running k8s v1.30.
Any update on this?
Same issue here
Same issue here
Solved with v0.11.2 😢
Same issue.
All pods show metrics just but buildkit pods show no metrics
Not sure if useful but here are the relevant metrics from a pod on the same node as buildkit. Notice there's a metric with an "image:" label present here
Same query for buildkit pod. Notice a glaring absense of a metric with moby/buildkit:0.15.2
image label. Only pause
image is present.
Did some more investigation.
k8s version: EKS 1.29
OS version: AL2023 (latest)
arch: AMD64
Using this basic example: https://github.com/moby/buildkit/blob/master/examples/kubernetes/pod.privileged.yaml
Original example:
apiVersion: v1
kind: Pod
metadata:
name: buildkitd
spec:
containers:
- name: buildkitd
image: moby/buildkit:master
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
securityContext:
privileged: true
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$ kubectl get pods
NAME READY STATUS RESTARTS AGE
buildkitd 1/1 Running 0 8m11s
buildkitd-011 1/1 Running 0 2m37s
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$ kubectl top pods buildkitd
Error from server (NotFound): podmetrics.metrics.k8s.io "default/buildkitd" not found
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$
Slightly adjusted example (the only thing changed is the buildkit version):
apiVersion: v1
kind: Pod
metadata:
name: buildkitd-011
spec:
containers:
- name: buildkitd
image: moby/buildkit:v0.11.6
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
securityContext:
privileged: true
What do you know it works?!
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$ kubectl top pods buildkitd-011
NAME CPU(cores) MEMORY(bytes)
buildkitd-011 3m 8Mi
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$
But that's not all. Get this. Rootless works just fine https://github.com/moby/buildkit/blob/master/examples/kubernetes/pod.rootless.yaml
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$ kubectl top pods
NAME CPU(cores) MEMORY(bytes)
buildkitd-011 3m 8Mi
buildkitd-rootless 5m 10Mi
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$
But that's not all. Get this. As soon as securityContext is removed from the pod.privileged.yaml it works as well:
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$ diff pod.privileged.yaml pod-test.yaml
4c4,5
< name: buildkitd
---
> name: buildkitd-test
>
7a9
>
8a11
>
25,26d27
< securityContext:
< privileged: true
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$ kubectl top pods
NAME CPU(cores) MEMORY(bytes)
buildkitd-rootless 2m 13Mi
buildkitd-test 0m 8Mi
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$
Why removing securitycontext privileged works, i have no idea. Other pods with the same context setting seem to work just fine:
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$ kubectl get pods -n kube-system -o yaml ebs-csi-node-45hwq | grep privil -B2
memory: 40Mi
securityContext:
privileged: true
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$ kubectl top pods -n kube-system ebs-csi-node-45hwqNAME CPU(cores) MEMORY(bytes)
ebs-csi-node-45hwq 1m 24Mi
dcherniv@lildebbie:~/Documents/personal/git/buildkit/examples/kubernetes$
Last working version was indeed v0.11.6
. On v0.12.0-rc1
the metrics are not showing.
@jedevc bump on this. I hope the supplied information is enough to get this issue going? It's less than ideal because we cannot see our CPU and memory usage in our builder pods to fine tune our spend. We ended up provisioning giant buildkit pods in order for the builders to have enough cpu/ram