akka-management
akka-management copied to clipboard
SunCertPathBuilderException when using Kubernetes api discovery in OpenShift
Versions used
Akka version: 2.6.15
Akka-management version: 1.1.1
Akka-http version: 10.2.6
Expected Behavior
We run an Akka cluster using Kubernetes API as discovery mechanism. Discovery is configured something like this.
akka.discovery {
kubernetes-api {
pod-namespace = "some-namespace"
pod-label-selector = "actorSystemName=someActorSystem"
}
}
Discovery should work this way in a normal Kubernetes distribution, e.g. AKS, as well as in OpenShift. (It was working in OpenShift in 1.0.10)
Actual Behavior
After upgrading to akka-management 1.1.1 the discovery throws an exception when running in OpenShift CodeReady Containers.
sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at java.base/sun.security.validator.PKIXValidator.doBuild(Unknown Source)
at java.base/sun.security.validator.PKIXValidator.engineValidate(Unknown Source)
at java.base/sun.security.validator.Validator.validate(Unknown Source)
at java.base/sun.security.ssl.X509TrustManagerImpl.validate(Unknown Source)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(Unknown Source)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(Unknown Source)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(Unknown Source)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(Unknown Source)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(Unknown Source)
at java.base/sun.security.ssl.SSLHandshake.consume(Unknown Source)
at java.base/sun.security.ssl.HandshakeContext.dispatch(Unknown Source)
at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(Unknown Source)
at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(Unknown Source)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(Unknown Source)
at akka.stream.impl.io.TLSActor.runDelegatedTasks(TLSActor.scala:437)
at akka.stream.impl.io.TLSActor.doUnwrap(TLSActor.scala:405)
at akka.stream.impl.io.TLSActor.doInbound(TLSActor.scala:298)
at akka.stream.impl.io.TLSActor.$anonfun$bidirectional$1(TLSActor.scala:233)
at akka.stream.impl.Pump.pump(Transfer.scala:203)
at akka.stream.impl.Pump.pump$(Transfer.scala:201)
at akka.stream.impl.io.TLSActor.pump(TLSActor.scala:52)
at akka.stream.impl.BatchingInputBuffer.enqueueInputElement(ActorProcessor.scala:97)
at akka.stream.impl.BatchingInputBuffer$$anonfun$upstreamRunning$1.applyOrElse(ActorProcessor.scala:148)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
at akka.stream.impl.SubReceive.apply(Transfer.scala:19)
at akka.stream.impl.FanIn$InputBunch$$anonfun$subreceive$1.applyOrElse(FanIn.scala:244)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
at akka.stream.impl.SubReceive.apply(Transfer.scala:19)
at akka.stream.impl.SubReceive.apply(Transfer.scala:15)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:189)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:188)
at akka.stream.impl.SubReceive.applyOrElse(Transfer.scala:15)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:244)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.stream.impl.io.TLSActor.aroundReceive(TLSActor.scala:52)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at java.base/sun.security.provider.certpath.SunCertPathBuilder.build(Unknown Source)
at java.base/sun.security.provider.certpath.SunCertPathBuilder.engineBuild(Unknown Source)
at java.base/java.security.cert.CertPathBuilder.build(Unknown Source)
... 47 more
Relevant logs
The exception is thrown while requesting the pods from the k8s api.
Failed k8s api request
2021-09-01T12:22:04.525+00:00|INFO|akka.discovery.kubernetes.KubernetesApiServiceDiscovery|Querying for pods with label selector: [actorSystemName=someActorSystem]. Namespace: [some-namespace]. Port: [None]
2021-09-01T12:22:04.614+00:00|DEBUG|akka.http.impl.engine.client.PoolId|Creating pool.
2021-09-01T12:22:04.768+00:00|DEBUG|com.zaxxer.hikari.pool.HikariPool|db - Added connection ConnectionID:1 ClientConnectionId: 96a60596-e9f9-4d7c-8e63-3f7ce58d3a28
2021-09-01T12:22:04.768+00:00|DEBUG|com.zaxxer.hikari.pool.HikariPool|db - After adding stats (total=1, active=1, idle=0, waiting=1)
2021-09-01T12:22:04.791+00:00|DEBUG|akka.http.impl.engine.client.PoolId|Dispatching request [GET /api/v1/namespaces/some-namespace/pods Empty] to pool
2021-09-01T12:22:04.800+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Unconnected)]Dispatching request [GET /api/v1/namespaces/some-namespace/pods Empty]
2021-09-01T12:22:04.818+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Unconnected)]Before event [onNewRequest] In state [Unconnected] for [53 ms]
2021-09-01T12:22:04.827+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Unconnected)]Establishing connection
2021-09-01T12:22:04.830+00:00|DEBUG|com.zaxxer.hikari.pool.HikariPool|db - Added connection ConnectionID:2 ClientConnectionId: dbfed338-38e7-4eac-9cbf-ed113af52428
2021-09-01T12:22:04.931+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Connecting)]After event [onNewRequest] State change [Unconnected] -> [Connecting]
2021-09-01T12:22:05.048+00:00|DEBUG|akka.io.TcpOutgoingConnection|Resolving 10.217.4.1 before connecting
2021-09-01T12:22:05.056+00:00|DEBUG|akka.persistence.typed.internal.EventSourcedBehaviorImpl|Replaying events: from: 1, to: 9223372036854775807
2021-09-01T12:22:05.104+00:00|DEBUG|akka.io.SimpleDnsManager|Resolution request for 10.217.4.1 from Actor[akka://someActorSystem/system/IO-TCP/selectors/$a/3#-1300459899]
2021-09-01T12:22:05.161+00:00|DEBUG|akka.io.InetAddressDnsResolver|Request for [10.217.4.1] was not yet cached
2021-09-01T12:22:05.181+00:00|DEBUG|akka.io.TcpOutgoingConnection|Attempting connection to [/10.217.4.1:443]
2021-09-01T12:22:05.190+00:00|DEBUG|akka.io.TcpOutgoingConnection|Connection established to [10.217.4.1:443]
2021-09-01T12:22:05.219+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Connecting)]Connection attempt succeeded
2021-09-01T12:22:05.220+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Connecting)]Before event [onConnectionAttemptSucceeded] In state [Connecting] for [290 ms]
2021-09-01T12:22:05.220+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Connecting)]Slot connection was established
2021-09-01T12:22:05.220+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (PushingRequestToConnection)]After event [onConnectionAttemptSucceeded] State change [Connecting] -> [PushingRequestToConnection]
2021-09-01T12:22:05.221+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (PushingRequestToConnection)]Before event [onRequestDispatched] In state [PushingRequestToConnection] for [0 ms]
2021-09-01T12:22:05.222+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (WaitingForResponse)]After event [onRequestDispatched] State change [PushingRequestToConnection] -> [WaitingForResponse]
2021-09-01T12:22:05.334+00:00|DEBUG|akka.actor.ActorSystemImpl|Outgoing request stream error javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2021-09-01T12:22:05.335+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (WaitingForResponse)]Connection failed
2021-09-01T12:22:05.335+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (WaitingForResponse)]Before event [onConnectionFailed] In state [WaitingForResponse] for [113 ms]
2021-09-01T12:22:05.335+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (WaitingForResponse)]Ongoing request [GET /api/v1/namespaces/some-namespace/pods Empty] is failed because of [connection failure]: [PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target]
2021-09-01T12:22:05.335+00:00|DEBUG|akka.http.impl.engine.client.PoolId|Request [GET /api/v1/namespaces/some-namespace/pods Empty] has 5 retries left, retrying...
This is strange, because the CA certificate used for signing the certificate of the api server is mounted at /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
.
Akka management should pick it up to verify the certificate.
To verify this, I enabled the Java ssl debug logging. This way I could see, that a certificate is added to the trust store.
CA certificate is added to trust store
javax.net.ssl|DEBUG|01|main|2021-09-06 06:44:39.673 GMT|X509TrustManagerImpl.java:79|adding as trusted certificates (
"certificate" : {
"version" : "v3",
"serial number" : "49 D0 54 71 73 AB EA 65",
"signature algorithm": "SHA256withRSA",
"issuer" : "CN=kube-apiserver-lb-signer, OU=openshift",
"not before" : "2021-08-10 04:47:02.000 GMT",
"not after" : "2031-08-08 04:47:02.000 GMT",
"subject" : "CN=kube-apiserver-lb-signer, OU=openshift",
"subject public key" : "RSA",
"extensions" : [
{
ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
CA:true
PathLen:2147483647
]
},
{
ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
DigitalSignature
Key_Encipherment
Key_CertSign
]
},
{
ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: A0 5E F7 E1 E6 4A CC C0 5B 9F 81 50 9A 03 84 47 .^...J..[..P...G
0010: A8 6A 00 59 .j.Y
]
]
}
]}
)
However, when you compare that to the certificates the api server sends during handshake, you will notice, that this certificate is signed by a different CA.
SSL Certificate handshake
javax.net.ssl|DEBUG|1C|ConfigurationCompartment-akka.actor.default-dispatcher-11|2021-09-06 06:44:40.178 GMT|CertificateMessage.java:366|Consuming server Certificate handshake message (
"Certificates": [
"certificate" : {
"version" : "v3",
"serial number" : "2E 1D 6B E4 3E 04 D0 BD",
"signature algorithm": "SHA256withRSA",
"issuer" : "CN=kube-apiserver-service-network-signer, OU=openshift",
"not before" : "2021-09-03 06:15:05.000 GMT",
"not after" : "2021-10-03 06:15:06.000 GMT",
"subject" : "CN=10.217.4.1",
"subject public key" : "RSA",
"extensions" : [
{
ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: 57 FB 22 A7 34 9D 84 A9 BB D3 CA 59 86 88 09 B9 W.".4......Y....
0010: BF E1 D6 43 ...C
]
]
},
{
ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
CA:false
PathLen: undefined
]
},
{
ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
serverAuth
]
},
{
ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
DigitalSignature
Key_Encipherment
]
},
{
ObjectId: 2.5.29.17 Criticality=false
SubjectAlternativeName [
DNSName: kubernetes
DNSName: kubernetes.default
DNSName: kubernetes.default.svc
DNSName: kubernetes.default.svc.cluster.local
DNSName: openshift
DNSName: openshift.default
DNSName: openshift.default.svc
DNSName: openshift.default.svc.cluster.local
DNSName: 10.217.4.1
IPAddress: 10.217.4.1
]
},
{
ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 0E 95 95 83 1B BB D7 CE F0 EA 35 5D 06 76 3F 18 ..........5].v?.
0010: 8A 36 FC BD .6..
]
]
}
]},
"certificate" : {
"version" : "v3",
"serial number" : "7D 84 A7 1F 89 16 76 5D",
"signature algorithm": "SHA256withRSA",
"issuer" : "CN=kube-apiserver-service-network-signer, OU=openshift",
"not before" : "2021-08-10 04:47:02.000 GMT",
"not after" : "2031-08-08 04:47:02.000 GMT",
"subject" : "CN=kube-apiserver-service-network-signer, OU=openshift",
"subject public key" : "RSA",
"extensions" : [
{
ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
CA:true
PathLen:2147483647
]
},
{
ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
DigitalSignature
Key_Encipherment
Key_CertSign
]
},
{
ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 57 FB 22 A7 34 9D 84 A9 BB D3 CA 59 86 88 09 B9 W.".4......Y....
0010: BF E1 D6 43 ...C
]
]
}
]}
]
)
As you can see, the certificate of the api server is signed by the kube-apiserver-service-network-signer
certificate. But the certificate added to the trust store is kube-apiserver-lb-signer
.
Then I noticed, that in case of OpenShift, the /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
file contains
multiple certificates. These are the subjects of the certificates in the order they occur in the file.
- OU = openshift, CN = kube-apiserver-lb-signer
- OU = openshift, CN = kube-apiserver-localhost-signer
- OU = openshift, CN = kube-apiserver-service-network-signer
- CN = openshift-kube-apiserver-operator_localhost-recovery-serving-signer@1628571796
- CN = *.apps-crc.testing
- CN = ingress-operator@1628571830
So while the api server is signed by the third certificate, akka management only loads the first certificate from the file.
This way, the certificate path cannot be validated, because kube-apiserver-service-network-signer
was not added to the
trust store.
In case of the AKS cluster we are running, the ca.crt file contains only one certificate. That's why it's working in k8s but not in OpenShift.
I have a working patch for this issue. I will create a PR as soon as I have my employers' agreement to sign the CLA.
In the meanwhile, this is the root cause:
The certificate is loaded by the PemManagersProvider.
https://github.com/akka/akka-management/blob/5557e2c3de229efca86451187949b8687b9059b6/management-pki/src/main/scala/akka/pki/kubernetes/PemManagersProvider.scala#L62-L65
certFactory#generateCertificate
calls sun.security.provider.X509Factory#engineGenerateCertificate
. This will load the first certificate from the file and return it. To load all certificates from the file, certFactory#generateCertificates
should be called instead, loading all certificates from the file.