TLS handshake error from API server
Description
Observed Behavior:
karpenter-c595bb5d8-8r8jr controller {"level":"ERROR","time":"2024-08-30T08:06:16.304Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:40666: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-hzfgs controller {"level":"ERROR","time":"2024-08-30T08:07:18.550Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:58290: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-8r8jr controller {"level":"ERROR","time":"2024-08-30T08:07:18.571Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:55794: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-8r8jr controller {"level":"ERROR","time":"2024-08-30T08:07:18.572Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:55792: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-hzfgs controller {"level":"ERROR","time":"2024-08-30T08:08:10.419Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:43424: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-8r8jr controller {"level":"ERROR","time":"2024-08-30T08:08:10.427Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:52314: EOF\n","commit":"62a726c"}
Expected Behavior: No errors :) Reproduction Steps (Please include YAML): Karpenter on fargate in karpenter namespace. These messages started to appear after upgrading to 1.0.1 Versions:
- Chart Version: 1.0.1
- Kubernetes Version (
kubectl version): 1.30
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
fixed with
webhook:
enabled: false
I don't think this issue should be closed. I am seeing a similar error in my log messages and require the webhook to remain enabled to facilitate the conversion to the latest api version for my resources.
I agree with @levinedaniel. What is the reason to mark solution as closed with
webhook:
enabled: false
The webhook is broken.
Same, v1.0.2. Please re-open.
Is disabling webhook an ok solution or some functionality will not work?
cc @sknmi message above
@Hronom reopened :)
Also seeing this issue after upgrading to v0.37.3.
Saw this issue on 0.37.3 and 1.0.1
Seeing same in 1.0.2
Below findings are incorrect
Here is my observation. Please let me know if this is incorrect:
Karpenter does not provide a ca-client bundle as we can see from here.
When I look at the CRD in my cluster, I can see that it has been injected with a caBundle:
webhook:
clientConfig:
caBundle: Redacted...
service:
name: karpenter
namespace: karpenter
path: /conversion/karpenter.sh
port: 8443
conversionReviewVersions:
- v1beta1
- v1
group: karpenter.sh
I believe this is happening through ca-injector. So this means, that client config for this webhook has a ca-bundle specified but karpenter uses knative to inject certificate data into karpernter-certsecret which comes from here.
So this means that CA for CRD & Webhooks do not match and hence the error. If this is correct, then may be we can look at the possible solutions
I am still not sure how CA bundle is injected in CRD and I did see at one point that the CA bundle in secret vs CRD was different.
This appears to be the same issue we saw with the our defaulting / validating webhooks previously, the original issue was closed out when those webhooks were disabled by default: https://github.com/kubernetes-sigs/karpenter/issues/718. I've been able to reproduce, and as with that issues there does not appear to be any actual impact to Karpenter's operation and the errors can be safely ignored.
From the original issue:
These TLS errors appear to be related to https://github.com/kubernetes/kubernetes/issues/109022 which states that these handshake errors may be generated by some caching mechanism that is happening in the standard library that causes TLS errors on a cert rotation.
@liafizan are you still running into this? The cert is injected by knative, and I've been unable to reproduce. If you're still encountering this, I'd recommend opening a separate issue. I don't think it's related to the TLS errors we're seeing here.
I am still not sure how CA bundle is injected in CRD and I did see at one point that the CA bundle in secret vs CRD was different.
I'm going to mark this issue as solved for now, but let us know if any of you believe this issue is impacting Karpenter's ability to operate.
Hello @jmdeal,
After upgrading to minor 0.37.5 to enable the deleting of webhooks when deployed with ArgoCD I see two things:
- first the validating and mutating webhooks are now properly deleted using ArgoCD.
- the second one is that my CRDs are not in version v1 and are still in v1beta1 so IMO the TLS handshake error is causing the conversion webhook to fail, which is a problem with Karpenter migration to v1.0.x.
kubectl get crd nodeclaims.karpenter.sh -o jsonpath='{.spec.versions[*].name}'=. v1 v1beta1 / So both versions exist in the cluster. Therefore the TLS handshake error in my case seems to prevent the validating webhook to perform the v1 migration. I checked the logs inside the controller and that is all I got from the webhook ...
the second one is that my CRDs are not in version v1 and are still in v1beta1 so IMO the TLS handshake error is causing the conversion webhook to fail
This doesn't indicate any issue with the conversion webhook. If you're on any pre-1.0 version with the conversion webhooks, the storage version is still v1beta1. The conversion webhooks only exist on those versions to enable rollback from v1.0. Also, once you upgrade to v1, both versions will still be present on the CRD, one isn't automatically removed once all stored resources are converted. Instead, you want to look at .status.storedVersions on the CRDs. On Karpenter v1.0.5+ Karpenter will remove v1beta1 from the stored versions once all CRs have been successfully migrated.
@jmdeal thank you for your answer, I misunderstood the conversion webhook and thought is was the other way around, thanks for the clarification !
We are seeing this same behavior. Upgrade from 0.37.0 to 1.0.3 (with a minor upgrade to 0.37.3 during the upgrade process). The error seems to be innocuous, but I wanted to see if there was any impact to the core functionality of Karpenter.
I have done the upgrade from 0.37.5 to 1.0.6 and still see this issue. I have enabled webhook in 0.37.5 and this error is from karpenter 1.0.6
{"level":"ERROR","time":"2024-10-09T14:27:06.147Z","logger":"webhook","message":"http: TLS handshake error from 10.214.2.206:34084: EOF\n","commit":"6174c75"} {"level":"ERROR","time":"2024-10-09T14:27:06.319Z","logger":"webhook","message":"http: TLS handshake error from 10.214.60.56:40108: EOF\n","commit":"6174c75"}
+1
I think this issue is caused by the conversion webhook configured on the CRDs (I have had a hard time with these already with #6818). I use pulumi transforms to remove them, the error is gone:
transforms: [
({ props, opts, type }) => {
if (type === "kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition") {
// Disable Karpenter conversion webhooks which was only useful when upgrading to v1 and now causes errors
props.spec.conversion = undefined;
return { props, opts };
}
return undefined;
}
]
hi, I did the karpenter version upgrade from v0.33.10 to v1.0.3 following the upgrade guide, https://karpenter.sh/docs/upgrading/v1-migration/#upgrade-procedure, but as mentioned above by others, ran into the TLS error, but without any impact on the karpenter functionalities.
{"level":"ERROR","time":"2024-11-01T05:16:43.587Z","logger":"webhook","message":"http: TLS handshake error from 100.x.x.x:32858: read tcp 100.x.x.x:8443->100.x.x.x:32858: read: connection reset by peer\n","commit":"688ea21"}
{"level":"ERROR","time":"2024-11-01T05:16:43.590Z","logger":"webhook","message":"http: TLS handshake error from 100.x.x.x:32876: read tcp 100.x.x.x:8443->100.x.x.x:32876: read: connection reset by peer\n","commit":"688ea21"}
i was able to ignore the errors by disabling the webhook by setting DISABLE_WEBHOOK=true. but as mentioned in the below thread, i am also not sure on the repercussions of this.
https://github.com/kubernetes-sigs/karpenter/issues/718#issuecomment-2447546036
following the discussions in threads, i believe these webhooks are necessary to migrate the api from v1beta1 to v1 in future release. can someone comment on this.
This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.
I would like to hear clarifications about this from developers. Specifically what is the recommended way if you use latest version of karpenter.
I still don't understand for what webhooks is used for and if I need to keep them enabled in latest version of karpenter.
This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.
One of the things I notice is that if we run a single replica of Karpenter, this error goes away. Not a recommendation, but reporting an observation if it helps the investigation.
This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.
I hope this error will be gone with update to 1.1, which should support only v1 API.
This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.
devs dead or why they are not responding? what are these webhooks, is disabling them safe?
Issue exists after upgrading to Karpenter v1.1.1 , it is quite misleading and pollutes our logs. Do you'l recommend to turn off the webhook?
"message":"http: TLS handshake error from [2a05:d014:3b8:5c05::221b]:41978: EOF\n","commit":"a2875e3"}
This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.
How will maintainers know about an issue if it is auto-closed?