trident
trident copied to clipboard
trident-csi 22.04 and 22.07 will not startup on openshift 4.11
We see the trident-csi not starting up with this error messages in the log:
Warning Unhealthy 6m36s (x10 over 7m21s) kubelet Startup probe failed: Get "https://10.9.96.45:17546/liveness": remote error: tls: protocol version not supported
2022/08/09 09:35:21 http: TLS handshake error from 10.9.96.45:48986: tls: client offered only unsupported versions: [303]
$ oc get csv -n openshift-cnv
NAME DISPLAY VERSION REPLACES PHASE
kubevirt-hyperconverged-operator.v4.11.0 OpenShift Virtualization 4.11.0 kubevirt-hyperconverged-operator.v4.10.2 Succeeded
I wonder if someone has seen this issue before? How are the supported TLS versions configured for the trident client when contacting the liveness service?
Environment
- Trident version: [22.04 and 22.07]
- Trident installation flags used: [default]
- Kubernetes version: [1.24.0]
- Kubernetes orchestrator: [OpenShift v4.11]
FYI, using this on the master node does give an ok:
# curl -k https://localhost:17546/liveness --tlsv1.3
ok
Trident requires all inbound connections to use tls1.3 and will use a minimum of tls1.2 for outbound connections, as of 22.04.
https://github.com/NetApp/trident/blob/1db81b0c539f3b882aa74b310df60fd2db49b30b/config/config.go#L122-L123
You mention 22.04 and 20.04 and 20.07 in your message. Can you please confirm if you mean 22.04 and 22.07?
It sounds like your kubelet is not able to use tls 1.3 for some reason.
Sorry, this was my mistake. We tried version 22.07 and 22.04.
Hi @newkit,
We haven't seen this issue when testing against an RC for OpenShift 4.11 with Trident. That said, OpenShift 4.11 is not GA as of yet and is not currently supported by Trident.
You said that you checked your master node but given the error Startup probe failed: Get "https://10.9.96.45:17546/liveness": remote error: tls: protocol version not supported
you also need to check your worker node. It appears that the error is indicating that the TLS client version on the worker node is not supported.
This is not new and have to my knowledge been the outcome while trying to upgrade trident to 22.04.0 and 22.07.0 with Openshift 4.10
Using OCP 4.10 and trident 22.01.1 which works fine I see the following when I debug the TLS negotiation via curl
sh-4.4# curl -v https://xx.xxx.xxx.xx:17546/readiness
* Trying 10.209.241.40...
* TCP_NODELAY set
* Connected to xx.xxx.xxx.xx (xx.xxx.xxx.xx) port 17546 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS alert, unknown CA (560):
* SSL certificate problem: unable to get local issuer certificate
* Closing connection 0
curl: (60) SSL certificate problem: unable to get local issuer certificate
via openssl s_client I can see that certificate presented looks like this
-----BEGIN CERTIFICATE-----
MIICpTCCAgagAwIBAgIHAvyVItyeqDAKBggqhkjOPQQDBDBOMQswCQYDVQQGEwJV
UzELMAkGA1UECBMCTkMxDDAKBgNVBAcTA1JUUDEPMA0GA1UEChMGTmV0QXBwMRMw
EQYDVQQDEwp0cmlkZW50LWNhMCAXDTcwMDEwMTAwMDAwMFoYDzIwNzAwMTAxMDAw
MDAwWjBPMQswCQYDVQQGEwJVUzELMAkGA1UECBMCTkMxDDAKBgNVBAcTA1JUUDEP
MA0GA1UEChMGTmV0QXBwMRQwEgYDVQQDEwt0cmlkZW50LWNzaTCBmzAQBgcqhkjO
PQIBBgUrgQQAIwOBhgAEAFa93F2IP5A4aS1rfMmrqKuv6U+W2dspQ15C3UgA4lWI
gl0s2pNhu3ft/W6byZ+nBkEXmnOlUyDrCpua+7heUGtuAOzh/Et79NP8COC9TYHS
LlJuTCAAzXV7nz+TTqTYPP7c9eDf12TZ9hISW8QbgeDQYCMhtICuYygqDCpkTn+v
XX3to4GIMIGFMBMGA1UdJQQMMAoGCCsGAQUFBwMBMCkGA1UdDgQiBCBzwXZKWKke
/dCKTap2VEvDkaAu+3GaMyZf43yXaVQvLTArBgNVHSMEJDAigCBW6ZVuHhhFWwAb
oWO1T7vHPAUmucqlCC1Qv/5E/7xTNTAWBgNVHREEDzANggt0cmlkZW50LWNzaTAK
BggqhkjOPQQDBAOBjAAwgYgCQgCrjix+OjvSAPVx5d1mg6xUVDzcGgkQLkQK78AY
WuABJAMVAOSTOqLQMxVyFwURvM2QE1qZnNzDtE9Dg4ihF07d+wJCATDzPGnzcbP/
uDcVHnOyii1EDmmOuqGZTqlN/5B19ZDdghJIqf89ZPp48u7smomUmElvglTX9tML
mRJMAmQtkDBF
-----END CERTIFICATE-----
Which forces me to guess that prior to the 22.04.0 release TLS was perhaps not enforced.
Edit: clarity, sorry.
Hi @rsjonte,
Thanks for the detailed clarification and using OCP 4.10 as the example. This is the commit that introduced using TLS 1.3 as the minimum TLS version. This change was included with the Trident v22.04 release.
I can't say with certainty why your curl command is failing but the common reason is because curl isn't able to verify the certificate being provided by the server. You can see in the above commit that the tls.RequireAndVerifyClientCert option was not changed so enforcement behavior shouldn't have changed.
You can contact us on Discord if you'd like to have a further discussion.
Hi @gnarl,
Thank you for your reply. I'm sorry if my comment was unclear.
The point I was trying to make was that in 22.01.1 and before, even though the CA was not able to be verified, the readiness check succeeded. As of 22.04.0, and again in 22.07.0, they fail with the handshake error from @newkit issue report.
They could of course be unrelated, and what I tried to show was that in 22.01.1 we already use TLS 1.3 and what I can see from an openssl s_client -connect is that the cipher used likely will be
New, TLSv1.3, Cipher is TLS_AES_128_GCM_SHA256
I just need to add one bit of information which is that the cluster I've been testing on is installed in FIPS mode, as that might limit which cipher are available.
This is still an ongoing issue in trident-installer-22.10.0.tar.gz
2022/11/10 11:19:45 http: TLS handshake error from 10.209.241.24:54640: tls: client offered only unsupported versions: [303]
2022/11/10 11:19:50 http: TLS handshake error from 10.209.241.24:43054: tls: client offered only unsupported versions: [303]
2022/11/10 11:19:55 http: TLS handshake error from 10.209.241.24:43070: tls: client offered only unsupported versions: [303]
2022/11/10 11:20:00 http: TLS handshake error from 10.209.241.24:53866: tls: client offered only unsupported versions: [303]
2022/11/10 11:20:05 http: TLS handshake error from 10.209.241.24:53870: tls: client offered only unsupported versions: [303]
Hi @rsjonte,
If you are still seeing this issue then please open a NetApp Support case. We haven't been able to reproduce the issue you've reported in our testing.
Hi @gnarl,
I do have support cases running(both with RH and with NetApp), but its an curve ball. We have been able to upgrade our none-FIPS compliant cluster without problems, but as for our FIPS compliant ones they all exhibit this issue. It's not an critical issue, but it is an roadblock for k8s 1.25 which for us will arrive with Openshift 4.12.
I'm guessing that organizations that run FIPS compliant Openshift clusters probably don't use NetApp for storage since we seems to be alone with the issue. Or we might just be snowflakes for some unknown reason.
I'll let you know once engineering figure out the root cause.
@gnarl
We seem to have make an breakthrough, here is an workaround for now. https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/Trident_doesn't_run_due_to_TLSv1.2_mismatch_on_Openshift_FIPS_setup%2C
Hi @rsjonte,
Thanks for the update and it is good to hear that you've found a work around.
At this time, Trident doesn't test for FIPS 140-2 compliance and uses Go's standard library crypto package. We are not planning to test OpenShift releases with FIPS enabled.
Hi @rsjonte,
Thanks again for reporting the workaround on this issue. We will continue to track customer demand for Trident to support OCP with FIPS enabled. Until FIPS is supported customers will need to implement this workaround.