trident icon indicating copy to clipboard operation
trident copied to clipboard

trident-csi 22.04 and 22.07 will not startup on openshift 4.11

Open newkit opened this issue 2 years ago • 7 comments

We see the trident-csi not starting up with this error messages in the log:

Warning  Unhealthy  6m36s (x10 over 7m21s)  kubelet            Startup probe failed: Get "https://10.9.96.45:17546/liveness": remote error: tls: protocol version not supported

2022/08/09 09:35:21 http: TLS handshake error from 10.9.96.45:48986: tls: client offered only unsupported versions: [303]

$ oc get csv  -n openshift-cnv
NAME                                       DISPLAY                    VERSION   REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.11.0   OpenShift Virtualization   4.11.0    kubevirt-hyperconverged-operator.v4.10.2   Succeeded

I wonder if someone has seen this issue before? How are the supported TLS versions configured for the trident client when contacting the liveness service?

Environment

  • Trident version: [22.04 and 22.07]
  • Trident installation flags used: [default]
  • Kubernetes version: [1.24.0]
  • Kubernetes orchestrator: [OpenShift v4.11]

newkit avatar Aug 09 '22 09:08 newkit

FYI, using this on the master node does give an ok:

# curl -k https://localhost:17546/liveness  --tlsv1.3
ok

newkit avatar Aug 09 '22 10:08 newkit

Trident requires all inbound connections to use tls1.3 and will use a minimum of tls1.2 for outbound connections, as of 22.04.

https://github.com/NetApp/trident/blob/1db81b0c539f3b882aa74b310df60fd2db49b30b/config/config.go#L122-L123

You mention 22.04 and 20.04 and 20.07 in your message. Can you please confirm if you mean 22.04 and 22.07?

It sounds like your kubelet is not able to use tls 1.3 for some reason.

adkerr avatar Aug 09 '22 12:08 adkerr

Sorry, this was my mistake. We tried version 22.07 and 22.04.

newkit avatar Aug 09 '22 13:08 newkit

Hi @newkit,

We haven't seen this issue when testing against an RC for OpenShift 4.11 with Trident. That said, OpenShift 4.11 is not GA as of yet and is not currently supported by Trident.

You said that you checked your master node but given the error Startup probe failed: Get "https://10.9.96.45:17546/liveness": remote error: tls: protocol version not supported you also need to check your worker node. It appears that the error is indicating that the TLS client version on the worker node is not supported.

gnarl avatar Aug 09 '22 17:08 gnarl

This is not new and have to my knowledge been the outcome while trying to upgrade trident to 22.04.0 and 22.07.0 with Openshift 4.10

Using OCP 4.10 and trident 22.01.1 which works fine I see the following when I debug the TLS negotiation via curl

sh-4.4# curl -v https://xx.xxx.xxx.xx:17546/readiness
*   Trying 10.209.241.40...
* TCP_NODELAY set
* Connected to xx.xxx.xxx.xx (xx.xxx.xxx.xx) port 17546 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS alert, unknown CA (560):
* SSL certificate problem: unable to get local issuer certificate
* Closing connection 0
curl: (60) SSL certificate problem: unable to get local issuer certificate

via openssl s_client I can see that certificate presented looks like this

-----BEGIN CERTIFICATE-----
MIICpTCCAgagAwIBAgIHAvyVItyeqDAKBggqhkjOPQQDBDBOMQswCQYDVQQGEwJV
UzELMAkGA1UECBMCTkMxDDAKBgNVBAcTA1JUUDEPMA0GA1UEChMGTmV0QXBwMRMw
EQYDVQQDEwp0cmlkZW50LWNhMCAXDTcwMDEwMTAwMDAwMFoYDzIwNzAwMTAxMDAw
MDAwWjBPMQswCQYDVQQGEwJVUzELMAkGA1UECBMCTkMxDDAKBgNVBAcTA1JUUDEP
MA0GA1UEChMGTmV0QXBwMRQwEgYDVQQDEwt0cmlkZW50LWNzaTCBmzAQBgcqhkjO
PQIBBgUrgQQAIwOBhgAEAFa93F2IP5A4aS1rfMmrqKuv6U+W2dspQ15C3UgA4lWI
gl0s2pNhu3ft/W6byZ+nBkEXmnOlUyDrCpua+7heUGtuAOzh/Et79NP8COC9TYHS
LlJuTCAAzXV7nz+TTqTYPP7c9eDf12TZ9hISW8QbgeDQYCMhtICuYygqDCpkTn+v
XX3to4GIMIGFMBMGA1UdJQQMMAoGCCsGAQUFBwMBMCkGA1UdDgQiBCBzwXZKWKke
/dCKTap2VEvDkaAu+3GaMyZf43yXaVQvLTArBgNVHSMEJDAigCBW6ZVuHhhFWwAb
oWO1T7vHPAUmucqlCC1Qv/5E/7xTNTAWBgNVHREEDzANggt0cmlkZW50LWNzaTAK
BggqhkjOPQQDBAOBjAAwgYgCQgCrjix+OjvSAPVx5d1mg6xUVDzcGgkQLkQK78AY
WuABJAMVAOSTOqLQMxVyFwURvM2QE1qZnNzDtE9Dg4ihF07d+wJCATDzPGnzcbP/
uDcVHnOyii1EDmmOuqGZTqlN/5B19ZDdghJIqf89ZPp48u7smomUmElvglTX9tML
mRJMAmQtkDBF
-----END CERTIFICATE-----

Which forces me to guess that prior to the 22.04.0 release TLS was perhaps not enforced.

Edit: clarity, sorry.

rsjonte avatar Aug 17 '22 10:08 rsjonte

Hi @rsjonte,

Thanks for the detailed clarification and using OCP 4.10 as the example. This is the commit that introduced using TLS 1.3 as the minimum TLS version. This change was included with the Trident v22.04 release.

I can't say with certainty why your curl command is failing but the common reason is because curl isn't able to verify the certificate being provided by the server. You can see in the above commit that the tls.RequireAndVerifyClientCert option was not changed so enforcement behavior shouldn't have changed.

You can contact us on Discord if you'd like to have a further discussion.

gnarl avatar Aug 17 '22 20:08 gnarl

Hi @gnarl,

Thank you for your reply. I'm sorry if my comment was unclear.

The point I was trying to make was that in 22.01.1 and before, even though the CA was not able to be verified, the readiness check succeeded. As of 22.04.0, and again in 22.07.0, they fail with the handshake error from @newkit issue report.

They could of course be unrelated, and what I tried to show was that in 22.01.1 we already use TLS 1.3 and what I can see from an openssl s_client -connect is that the cipher used likely will be
New, TLSv1.3, Cipher is TLS_AES_128_GCM_SHA256

I just need to add one bit of information which is that the cluster I've been testing on is installed in FIPS mode, as that might limit which cipher are available.

rsjonte avatar Aug 22 '22 09:08 rsjonte

This is still an ongoing issue in trident-installer-22.10.0.tar.gz

2022/11/10 11:19:45 http: TLS handshake error from 10.209.241.24:54640: tls: client offered only unsupported versions: [303]
2022/11/10 11:19:50 http: TLS handshake error from 10.209.241.24:43054: tls: client offered only unsupported versions: [303]
2022/11/10 11:19:55 http: TLS handshake error from 10.209.241.24:43070: tls: client offered only unsupported versions: [303]
2022/11/10 11:20:00 http: TLS handshake error from 10.209.241.24:53866: tls: client offered only unsupported versions: [303]
2022/11/10 11:20:05 http: TLS handshake error from 10.209.241.24:53870: tls: client offered only unsupported versions: [303]

rsjonte avatar Nov 10 '22 11:11 rsjonte

Hi @rsjonte,

If you are still seeing this issue then please open a NetApp Support case. We haven't been able to reproduce the issue you've reported in our testing.

gnarl avatar Dec 07 '22 19:12 gnarl

Hi @gnarl,

I do have support cases running(both with RH and with NetApp), but its an curve ball. We have been able to upgrade our none-FIPS compliant cluster without problems, but as for our FIPS compliant ones they all exhibit this issue. It's not an critical issue, but it is an roadblock for k8s 1.25 which for us will arrive with Openshift 4.12.

I'm guessing that organizations that run FIPS compliant Openshift clusters probably don't use NetApp for storage since we seems to be alone with the issue. Or we might just be snowflakes for some unknown reason.

I'll let you know once engineering figure out the root cause.

rsjonte avatar Dec 13 '22 13:12 rsjonte

@gnarl

We seem to have make an breakthrough, here is an workaround for now. https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/Trident_doesn't_run_due_to_TLSv1.2_mismatch_on_Openshift_FIPS_setup%2C

rsjonte avatar Dec 13 '22 17:12 rsjonte

Hi @rsjonte,

Thanks for the update and it is good to hear that you've found a work around.

At this time, Trident doesn't test for FIPS 140-2 compliance and uses Go's standard library crypto package. We are not planning to test OpenShift releases with FIPS enabled.

gnarl avatar Dec 20 '22 18:12 gnarl

Hi @rsjonte,

Thanks again for reporting the workaround on this issue. We will continue to track customer demand for Trident to support OCP with FIPS enabled. Until FIPS is supported customers will need to implement this workaround.

gnarl avatar Feb 13 '23 14:02 gnarl