piraeus-operator icon indicating copy to clipboard operation
piraeus-operator copied to clipboard

Error work tls connection with external linstor controller

Open nobleess opened this issue 8 months ago • 13 comments

Error in linstor-csi-node linstor-csi-controller

time="2025-03-05T08:22:37Z" level=info msg="not ready" error="Get \"https://linstor-controller:3371/v1/controller/version\": EOF" version=refs/tags/v0.2.2
time="2025-03-05T08:22:47Z" level=info msg="not ready" error="Get \"https://linstor-controller:3371/v1/controller/version\": EOF" version=refs/tags/v0.2.2
time="2025-03-05T08:22:57Z" level=info msg="not ready" error="Get \"https://linstor-controller:3371/v1/controller/version\": EOF" version=refs/tags/v0.2.2
time="2025-03-05T08:23:08Z" level=info msg="not ready" error="Get \"https://linstor-controller:3371/v1/controller/version\": EOF" version=refs/tags/v0.2.2

version: piraeus-operator:v2.8.0 linstor-wait-api-online piraeus-csi:v1.6.4 linstore 1.24-1

from

  • https://github.com/piraeusdatastore/piraeus-operator/issues/492#issuecomment-1649240216

nobleess avatar Mar 05 '25 08:03 nobleess

Are you sure that in your Kubernetes cluster https://linstor-controller:3371 resolves to the right address? Could you check that by running:

kubectl exec deploy/linstor-csi-controller -c linstor-wait-api-online -- getent hosts linstor-controller

I suspect the hostname is instead interpreted as some cluster-internal service name,

WanzenBug avatar Mar 05 '25 08:03 WanzenBug

root@linstor-csi-controller-5b9c8bdb8c-6plk6:/# getent hosts linstor-controller
10.11.20.1      linstor-controller

nobleess avatar Mar 05 '25 09:03 nobleess

10.11.20.1 this vip linstor-controller
how check http://linstor-controller:3370/v1/controller/version all work

nobleess avatar Mar 05 '25 09:03 nobleess

does http://linstor-controller:3370/controller/version actually return the version? Because that would mean that HTTPS is not configured on the controller.

WanzenBug avatar Mar 05 '25 09:03 WanzenBug

Authentification is working with Proxmox itself, curl returns valid data, https enabled on controller curl with authentification enabled (from linstor satellite node)

curl -k -v -X 'GET' 'https://linstor-controller:3371/v1/controller/version' -H 'accept: application/json' --cert api-client.crt --key api-client.key --cacert ca.csr
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 10.11.20.1:3371...
* Connected to linstor-controller (10.11.20.1) port 3371 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: CN=linstor-controller
*  start date: Mar  5 08:58:07 2025 GMT
*  expire date: Mar  3 08:58:07 2035 GMT
*  issuer: CN=linstor-api-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* using HTTP/1.x
> GET /v1/controller/version HTTP/1.1
> Host: linstor-controller:3371
> User-Agent: curl/7.88.1
> accept: application/json
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: origin, content-type, accept, authorization
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS, HEAD
< Content-Type: application/json
< Content-Length: 143
<
* Connection #0 to host linstor-controller left intact
{"version":"1.30.2","git_hash":"9b7fae3f07c1017b5de4f476d2c57491728aefbe","build_time":"2024-12-18T14:53:54+00:00","rest_api_version":"1.24.0"}

http

curl -k -v -X 'GET' 'http://linstor-controller:3371/v1/controller/version' -H 'accept: application/json'
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 10.11.20.1:3371...
* Connected to linstor-controller (10.11.20.1) port 3371 (#0)
> GET /v1/controller/version HTTP/1.1
> Host: linstor-controller:3371
> User-Agent: curl/7.88.1
> accept: application/json
>
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server

nmap

nmap -p 3371 --script ssl-cert linstor-controller
Starting Nmap 7.93 ( https://nmap.org ) at 2025-03-05 12:22 MSK
Nmap scan report for linstor-controller (10.11.20.1)
Host is up (0.000027s latency).

PORT     STATE SERVICE
3371/tcp open  satvid-datalnk
| ssl-cert: Subject: commonName=linstor-controller
| Subject Alternative Name: DNS:linstor-controller, DNS:10.11.20.1, IP Address:10.11.20.1
| Issuer: commonName=linstor-api-ca
| Public Key type: rsa
| Public Key bits: 4096
| Signature Algorithm: sha256WithRSAEncryption
| Not valid before: 2025-03-05T08:58:07
| Not valid after:  2035-03-03T08:58:07
| MD5:   358159e75d9e167abf6dbb433cd6bcd3
|_SHA-1: 9b392324dd22dd03f59707e3188e87038affb9ab

Nmap done: 1 IP address (1 host up) scanned in 0.29 seconds

rise-serg avatar Mar 05 '25 09:03 rise-serg

Please try without curl -k. It looks like there is an issue with the CA certificate:

*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.

--cacert ca.csr this is nonsense, you want to set --cacert ca.crt.

WanzenBug avatar Mar 05 '25 09:03 WanzenBug

Sorry, typo with a cert extension

curl -v -X 'GET' 'https://linstor-controller:3371/v1/controller/version' -H 'accept: application/json' --cert api-client.crt --key api-client.key --cacert ca.crt
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 10.11.20.1:3371...
* Connected to linstor-controller (10.11.20.1) port 3371 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: ca.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, CERT verify (15):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: CN=linstor-controller
*  start date: Mar  5 08:58:07 2025 GMT
*  expire date: Mar  3 08:58:07 2035 GMT
*  subjectAltName: host "linstor-controller" matched cert's "linstor-controller"
*  issuer: CN=linstor-api-ca
*  SSL certificate verify ok.
* using HTTP/1.x
> GET /v1/controller/version HTTP/1.1
> Host: linstor-controller:3371
> User-Agent: curl/7.88.1
> accept: application/json
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: origin, content-type, accept, authorization
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS, HEAD
< Content-Type: application/json
< Content-Length: 143
<
* Connection #0 to host linstor-controller left intact
{"version":"1.30.2","git_hash":"9b7fae3f07c1017b5de4f476d2c57491728aefbe","build_time":"2024-12-18T14:53:54+00:00","rest_api_version":"1.24.0"}

Proxmox still good with it

drbd: pve-prod-replicated-ssd
        resourcegroup pve-prod-replicated-ssd
        content images,rootdir
        controller linstor-controller
        apicrt /etc/pve/client.crt
        apikey /etc/pve/client.key
        apica /etc/pve/ca.crt

rise-serg avatar Mar 05 '25 09:03 rise-serg

You could try testing directly with https://github.com/LINBIT/linstor-wait-until/releases/tag/v0.2.3 Set the LS_* environment variables and run ./linstor-wait-until api-online...

WanzenBug avatar Mar 05 '25 10:03 WanzenBug

/up.sh
{   } <nil> Get "https://linstor-controller:3371/v1/controller/version": EOF
INFO[0000] not ready                                     error="Get \"https://linstor-controller.tages.infra:3371/v1/controller/version\": EOF" version=unknown
^C^C^C{   } <nil> context canceled
FATA[0010] context cancelled                             error="context canceled" version=unknown

nobleess avatar Mar 05 '25 11:03 nobleess

Guess there is some issue with the go TLS library? Or some setting we have not properly set 🤔

WanzenBug avatar Mar 05 '25 12:03 WanzenBug

I did some internal tests, and it seems the error="Get .... : EOF" would indicate that the LS_USER_KEY and LS_USER_CERTIFICATE variable are not set to an accepted value.

Can you confirm that "echo $LS_USER_KEY" returns the same certificate as /etc/pve/client.crt and "echo $LS_USER_CERTIFICATE" the same as /etc/pve/client.key?

WanzenBug avatar Mar 05 '25 13:03 WanzenBug

solution https://github.com/piraeusdatastore/linstor-csi/issues/128#issuecomment-893565972 @WanzenBug This should be added to the official documentation

nobleess avatar Mar 07 '25 19:03 nobleess

I see. That is indeed a very specific issue. I'll see if this can somehow be checked with openssl

WanzenBug avatar Mar 10 '25 15:03 WanzenBug