piraeus-operator icon indicating copy to clipboard operation
piraeus-operator copied to clipboard

How can i use CSI with an existing LINSTOR Cluster, managed outside Kubernetes?

Open lomonosov77 opened this issue 2 years ago • 14 comments

Hello! I have k8s cluster and i want to use Linstor CSI Plugin with an existing LINSTOR Cluster, managed outside Kubernetes. I've used manifest - https://github.com/piraeusdatastore/linstor-csi/blob/master/examples/k8s/deploy/linstor-csi-1.19.yaml , but KUBE_NODE_NAME value equal name of k8s node and it's not the same in my case. Who knows how to set it up, help me, please!

lomonosov77 avatar Jun 22 '23 07:06 lomonosov77

That will be difficult. Why are your host names different between kubernetes and the host OS?

WanzenBug avatar Jun 23 '23 10:06 WanzenBug

linstor is installed on a different server and is not managed by the k8s cluster. These are 2 different hardware.

lomonosov77 avatar Jul 01 '23 09:07 lomonosov77

You still need at least a satellite running on the kubernetes workers. You can use the operator to connect to your existing cluster: https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/how-to/external-controller.md

WanzenBug avatar Jul 04 '23 07:07 WanzenBug

I tried that. And how do I add linstor satellites that are outside of the Kubernetes cluster?

lomonosov77 avatar Jul 05 '23 13:07 lomonosov77

Well, you can start with setting up the Controller and external Satellites, including registering the satellites with linstor node create.

Then, you can use the Operator to connect to the existing cluster. The Operator will register the satellites it control in the Kubernetes cluster.

WanzenBug avatar Jul 10 '23 06:07 WanzenBug

The whole problem is that the satelite pod won't start. The problem is in drbd-shutdown-guard. Here is the container log

_**

2023/07/20 16:20:12 Running drbd-shutdown-guard version v1.0.0 2023/07/20 16:20:12 Creating service directory '/run/drbd-shutdown-guard' 2023/07/20 16:20:12 Copying drbdsetup to service directory 2023/07/20 16:20:12 Copying drbd-shutdown-guard to service directory 2023/07/20 16:20:12 Optionally: relabel service directory for SELinux 2023/07/20 16:20:12 ignoring error when setting selinux label: exit status 127 2023/07/20 16:20:12 Creating systemd unit drbd-shutdown-guard.service in /run/systemd/system 2023/07/20 16:20:12 Reloading systemd Error: failed to reload systemd Usage: drbd-shutdown-guard install [flags]

Flags: -h, --help help for install

2023/07/20 16:20:12 failed: failed to reload systemd

**_

lomonosov77 avatar Jul 20 '23 16:07 lomonosov77

Same issue with drbd-shutdown-guard on Ubuntu 22.04 LTS

x86128 avatar Jul 24 '23 11:07 x86128

See https://github.com/piraeusdatastore/piraeus-operator/issues/426#issuecomment-1465934727 for a workaround, and the overall issue

WanzenBug avatar Jul 24 '23 11:07 WanzenBug

thanks, recipe from https://github.com/piraeusdatastore/piraeus-operator/issues/426#issuecomment-1465934727 helps, I'm using external linstor-controller with api tls encryption. Is it possible to set LinstorCluster resource to use api client certs? I'd build LinstorCluster this way:

apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
  name: linstorcluster
spec:
  externalController:
    url: https://my-controller-ip:3371
  apiTLS:
    apiSecretName: linstor-api-tls
    clientSecretName: linstor-client-tls
    csiControllerSecretName: linstor-client-tls
    csiNodeSecretName: linstor-client-tls

with secrets set according to guide https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/how-to/api-tls.md#provision-keys-and-certificates-using-openssl

but in logs of linstor-wait-node-online got lines like this: time="2023-07-24T12:38:58Z" level=info msg="not ready" error="Get \"https://my-controller-ip:3371/v1/nodes/ds1-d-master03\": EOF" version=refs/tags/v0.2.1

x86128 avatar Jul 24 '23 12:07 x86128

Can you verify that the linstor-client-tls secret is in use by the linstor-wait-node-online container? It should be set via environment variables.

WanzenBug avatar Jul 24 '23 14:07 WanzenBug

I'd entered into linstor-wait-node-online to check envs and connectivity:

# check env
env | grep "^LS"
LS_USER_CERTIFICATE=-----BEGIN CERTIFICATE-----
LS_CONTROLLERS=https://my-controller-ip:3371
LS_USER_KEY=-----BEGIN RSA PRIVATE KEY-----
LS_ROOT_CA=-----BEGIN CERTIFICATE-----

# check connectivity
apt install curl

echo "$LS_USER_CERTIFICATE" > client.crt
echo "$LS_USER_KEY" > client.key
echo "$LS_ROOT_CA" > ca.crt
curl -I --cacert ca.crt --cert client.crt --key client.key $LS_CONTROLLERS
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: origin, content-type, accept, authorization
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS, HEAD
Content-Length: 294
Content-Type: text/plain

# install linstor client utility
apt install gpg
curl -o /tmp/package-signing-pubkey.asc https://packages.linbit.com/package-signing-pubkey.asc

gpg --yes -o /etc/apt/trusted.gpg.d/linbit-keyring.gpg --dearmor /tmp/package-signing-pubkey.asc

PVERS=7 && echo "deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg] \
 http://packages.linbit.com/public/ proxmox-$PVERS drbd-9" | tee -a /etc/apt/sources.list.d/linbit.list
deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg]  http://packages.linbit.com/public/ proxmox-7 drbd-9

apt update && apt install linstor-client

# run 
linstor n l
Error: Error reading response from https://my-controller-ip:3371: Remote end closed connection without response
Error: Unable to connect to linstor://localhost:3370: [Errno 111] Connection refused

# run with explicit args
linstor --certfile client.crt --key client.key --cafile ca.crt --controllers my-controller-ip:3371 n l
╭─────────────────────────────────────────────────────────────╮
┊ Node       ┊ NodeType  ┊ Addresses                ┊ State   ┊
╞═════════════════════════════════════════════════════════════╡
┊ stor01     ┊ SATELLITE ┊ x.x.x.32:3366 (PLAIN)    ┊ Online  ┊
┊ stor       ┊ SATELLITE ┊ x.x.x.27:3366 (PLAIN)    ┊ Online  ┊
┊ worker01   ┊ SATELLITE ┊ x.x.x.225:3366 (PLAIN)   ┊ Online  ┊
┊ worker02   ┊ SATELLITE ┊ x.x.x.226:3366 (PLAIN)   ┊ Online  ┊
┊ worker03   ┊ SATELLITE ┊ x.x.x.227:3366 (PLAIN)   ┊ Online  ┊
┊ worker04   ┊ SATELLITE ┊ x.x.x.228:3366 (PLAIN)   ┊ Online  ┊
┊ worker05   ┊ SATELLITE ┊ x.x.x.229:3366 (PLAIN)   ┊ EVICTED ┊
┊ stor02     ┊ SATELLITE ┊ x.x.x.26:3366 (PLAIN)    ┊ Online  ┊
┊ stor03     ┊ SATELLITE ┊ x.x.x.37:3366 (PLAIN)    ┊ Online  ┊
┊ stor04     ┊ SATELLITE ┊ x.x.x.34:3366 (PLAIN)    ┊ Online  ┊
╰─────────────────────────────────────────────────────────────╯

x86128 avatar Jul 25 '23 07:07 x86128

See #426 (comment) for a workaround, and the overall issue

This method helps to start LinstorSatellite, but it begs the question: is the drbd-shutdown-guard container unnecessary if the solution is to remove it?

lomonosov77 avatar Jul 26 '23 08:07 lomonosov77

It does have a purpose, namely to improve DRBD handling during shut down:

By default, DRBD volumes created by Piraeus will suspend IO if connection is lost. During node shutdown, the DRBD devices will remain configured, and mounted, unless the node was properly evicted in Kubernetes first.

But during shutdown, the Pod network will stop working, at which point DRBD can no longer access the peers, so IO is suspended. Eventually systemd will come around and try to unmount all remaining mounts, including those mounts for containers using Piraeus Volumes. Then the unmount gets stuck, because DRBD is suspending IO.

You would have to do a hard reset, as most systemd by default do not have a unmount time out in systemd. The shutdown-guard is there to run during node shutdown and force the DRBD device to report IO errors instead: then unmounting can continue.

So while not strictly necessary, it's definitly "nice to have".

WanzenBug avatar Jul 26 '23 08:07 WanzenBug

I'd entered into linstor-wait-node-online to check envs and connectivity:

# check env
env | grep "^LS"
LS_USER_CERTIFICATE=-----BEGIN CERTIFICATE-----
LS_CONTROLLERS=https://my-controller-ip:3371
LS_USER_KEY=-----BEGIN RSA PRIVATE KEY-----
LS_ROOT_CA=-----BEGIN CERTIFICATE-----

# check connectivity
apt install curl

echo "$LS_USER_CERTIFICATE" > client.crt
echo "$LS_USER_KEY" > client.key
echo "$LS_ROOT_CA" > ca.crt
curl -I --cacert ca.crt --cert client.crt --key client.key $LS_CONTROLLERS
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: origin, content-type, accept, authorization
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS, HEAD
Content-Length: 294
Content-Type: text/plain

# install linstor client utility
apt install gpg
curl -o /tmp/package-signing-pubkey.asc https://packages.linbit.com/package-signing-pubkey.asc

gpg --yes -o /etc/apt/trusted.gpg.d/linbit-keyring.gpg --dearmor /tmp/package-signing-pubkey.asc

PVERS=7 && echo "deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg] \
 http://packages.linbit.com/public/ proxmox-$PVERS drbd-9" | tee -a /etc/apt/sources.list.d/linbit.list
deb [signed-by=/etc/apt/trusted.gpg.d/linbit-keyring.gpg]  http://packages.linbit.com/public/ proxmox-7 drbd-9

apt update && apt install linstor-client

# run 
linstor n l
Error: Error reading response from https://my-controller-ip:3371: Remote end closed connection without response
Error: Unable to connect to linstor://localhost:3370: [Errno 111] Connection refused

# run with explicit args
linstor --certfile client.crt --key client.key --cafile ca.crt --controllers my-controller-ip:3371 n l
╭─────────────────────────────────────────────────────────────╮
┊ Node       ┊ NodeType  ┊ Addresses                ┊ State   ┊
╞═════════════════════════════════════════════════════════════╡
┊ stor01     ┊ SATELLITE ┊ x.x.x.32:3366 (PLAIN)    ┊ Online  ┊
┊ stor       ┊ SATELLITE ┊ x.x.x.27:3366 (PLAIN)    ┊ Online  ┊
┊ worker01   ┊ SATELLITE ┊ x.x.x.225:3366 (PLAIN)   ┊ Online  ┊
┊ worker02   ┊ SATELLITE ┊ x.x.x.226:3366 (PLAIN)   ┊ Online  ┊
┊ worker03   ┊ SATELLITE ┊ x.x.x.227:3366 (PLAIN)   ┊ Online  ┊
┊ worker04   ┊ SATELLITE ┊ x.x.x.228:3366 (PLAIN)   ┊ Online  ┊
┊ worker05   ┊ SATELLITE ┊ x.x.x.229:3366 (PLAIN)   ┊ EVICTED ┊
┊ stor02     ┊ SATELLITE ┊ x.x.x.26:3366 (PLAIN)    ┊ Online  ┊
┊ stor03     ┊ SATELLITE ┊ x.x.x.37:3366 (PLAIN)    ┊ Online  ┊
┊ stor04     ┊ SATELLITE ┊ x.x.x.34:3366 (PLAIN)    ┊ Online  ┊
╰─────────────────────────────────────────────────────────────╯

I have the same problem. time="2023-07-26T08:03:28Z" level=info msg="not ready" error="satellite srv-k3s-w-02 is not ONLINE: OFFLINE" version=refs/tags/v0.2.1 I tried your method, but it didn't work in my case. Nodes are still offline

 curl -I --cacert ca.crt --cert client.crt --key client.key $LS_CONTROLLERS
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: origin, content-type, accept, authorization
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS, HEAD
Content-Length: 294
Content-Type: text/plain

and

linstor --certfile client.crt --key client.key --cafile ca.crt --controllers $LS_CONTROLLERS n l ╭───────────────────────────────────────────────────────────────────╮ ┊ Node ┊ NodeType ┊ Addresses ┊ State ┊ ╞═══════════════════════════════════════════════════════════════════╡ ┊ srv2 ┊ SATELLITE ┊ 192.168.144.2:3366 (PLAIN) ┊ Online ┊ ┊ srv-k3s-w-02 ┊ SATELLITE ┊ 192.168.130.206:3366 (PLAIN) ┊ OFFLINE ┊ ┊ srv-k3s-w-03 ┊ SATELLITE ┊ 192.168.130.207:3366 (PLAIN) ┊ OFFLINE ┊ ╰───────────────────────────────────────────────────────────────────╯

lomonosov77 avatar Jul 26 '23 08:07 lomonosov77

same issue with linstor v1.29.2 and piraeus v2.7.1

quinhn-vnp avatar Dec 10 '24 07:12 quinhn-vnp

@WanzenBug can you raise criticality ?

nobleess avatar Mar 05 '25 08:03 nobleess

Since this is a mishmash of different issues. I'm going ahead and close this.

If you just need a pointer on how to set up LINSTOR with an external cluster, see here.

If the nodes show up in the LINSTOR list, but are offline: check you network connectivity from outside the cluster into k8s.

If it's something with drbd-shutdown-guard, report to #426.

WanzenBug avatar Mar 05 '25 08:03 WanzenBug