mayastor icon indicating copy to clipboard operation
mayastor copied to clipboard

PV isn't created when deploying test application from docs.

Open Daxcor69 opened this issue 2 years ago • 11 comments

The test application will not deploy, waits in pending. Please help, I am so close to getting this working.

Cluster Setup: worker 1-3

  • these do have the mayastor engine lable added
  • have mod nvme-tcp enabled
  • have hugepages set
  • these do not have physical disks for the storage cluster

storage1-3

  • these do have the mayastor engine lable added
  • have mod nvme-tcp enabled
  • have hugepages set
  • these do have physical disks for the storage cluster

MSP's created: image

Storage Class created:

apiVersion: storage.k8s.io/v1
metadata:
  name: mayastor-1
parameters:
  fsType: xfs
  repl: '1'
  protocol: 'nvmf'
  ioTimeout: '60'
  local: 'false'            <------------- this is deliberate.  I need the pod requiring storage to be schedulable on any node.
provisioner: io.openebs.csi-mayastor

Then I try to provision the test application from the docs. the PVC is created but stays in pending state with the following error in the cis controller pod

W0830 15:13:24.748827       1 topology.go:321] No topology keys found on any node
W0830 15:13:24.748840       1 controller.go:958] Retrying syncing claim "924eb8c2-d999-4aed-b601-d63cd9d5bdcb", failure 7
E0830 15:13:24.748852       1 controller.go:981] error syncing claim "924eb8c2-d999-4aed-b601-d63cd9d5bdcb": failed to provision volume with StorageClass "mayastor-1": error generating accessibility requirements: no available topology found
I0830 15:13:24.748863       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ms-volume-claim", UID:"924eb8c2-d999-4aed-b601-d63cd9d5bdcb", APIVersion:"v1", ResourceVersion:"234587", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/ms-volume-claim"
I0830 15:13:24.748871       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ms-volume-claim", UID:"924eb8c2-d999-4aed-b601-d63cd9d5bdcb", APIVersion:"v1", ResourceVersion:"234587", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "mayastor-1": error generating accessibility requirements: no available topology found

The test pod also stays in pending because the pvc doesn't start. No PV is ever created. I tried to manually create a pv and the pvc did bind, however the pod had an error. AttachVolume.Attach failed for volume "pv0001" : CSINode storage1 does not contain driver io.openebs.csi-mayastor

Daxcor69 avatar Aug 30 '22 16:08 Daxcor69

Hi, @Daxcor69 can we take a look at the logs of csi-node pods on the storage1 node? Seems like the csi-node pod has not come up successfully on that node.

Abhinandan-Purkait avatar Aug 30 '22 17:08 Abhinandan-Purkait

I0829 17:13:08.332189 1 main.go:113] Version: v2.1.0-0-g80d42f24 I0829 17:13:08.332607 1 connection.go:153] Connecting to unix:///csi/csi.sock I0829 17:13:08.350705 1 node_register.go:52] Starting Registration Server at: /registration/io.openebs.csi-mayastor-reg.sock I0829 17:13:08.351062 1 node_register.go:61] Registration Server started at: /registration/io.openebs.csi-mayastor-reg.sock I0829 17:13:08.351419 1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""

[2022-08-29T17:13:06Z INFO mayastor_csi] Removed stale CSI socket /csi/csi.sock [2022-08-29T17:13:06Z INFO mayastor_csi] CSI plugin bound to /csi/csi.sock [2022-08-29T17:13:06Z INFO mayastor_csi::nodeplugin_grpc] Mayastor node plugin gRPC server configured at address 10.0.1.7:10199 [2022-08-29T17:13:08Z DEBUG mayastor_csi::identity] GetPluginInfo request (io.openebs.csi-mayastor:0.2)

Daxcor69 avatar Aug 30 '22 19:08 Daxcor69

That is all that is there in the two containers on storage 1

Daxcor69 avatar Aug 30 '22 19:08 Daxcor69

Can we see the csinode object on storage1, i.e kubectl get csinode storage1 -oyaml?

Abhinandan-Purkait avatar Aug 30 '22 19:08 Abhinandan-Purkait

apiVersion: storage.k8s.io/v1 kind: CSINode metadata: annotations: storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd creationTimestamp: "2022-08-29T14:34:41Z" name: storage1 ownerReferences:

  • apiVersion: v1 kind: Node name: storage1 uid: 58f13f19-8d6c-46cc-8cb8-47fc9dc10ccf resourceVersion: "382" uid: 2f8d4689-745b-4e63-bc42-bca1aeeaa2b2 spec: drivers: null

Daxcor69 avatar Aug 30 '22 19:08 Daxcor69

Seems like the mayastor csi driver is not registered! Can you restart the csi-node pod on storage1 and check if that changes anything in the csinode object?

Abhinandan-Purkait avatar Aug 30 '22 20:08 Abhinandan-Purkait

i restarted all six csi nodes and no change.

Daxcor69 avatar Aug 30 '22 21:08 Daxcor69

I restarted all pod in the mayastor name space. All pods come up running and healthy. the driver is still null on all csi

Daxcor69 avatar Aug 30 '22 21:08 Daxcor69

I have totally deleted the mayastor from the k8s cluster. I did a complete reinstall. etcd-2 is still having issues it can't find the other members. i have created the msp's and they all show online. When I go and run the command requested above I still get driver: null on the out put of the csi node object.

the error from your test application is: failed to provision volume with StorageClass "mayastor-1": error generating accessibility requirements: no available topology found

Env: Unbuntu 22.04.1 minimal install on amd64 hardware. 8T sata drives for storage use on /dev/sdb not mounted, formated or partioned. On storage node 1-3 I am running k0s kubernetes distribution on k8s 1.24.2.

No other applications or processes are running in the cluster or on the hosts.

The host has two networks. eth0 - public and eth1 - private which is using 10.0.1.0/24 for host. i have a firewall in place that blocks all traffic on the public network, and allows 10.0.0.0/8 to any port on any system. This should cover all the networks. K0s, was told to install on the pubic network with the following cidrs 10.96.0.0/16 and 10.244.0.0/12.

Deployment Goal: Storage nodes 1-3 each have 8 T sata drives that will be used for the cluster storage requirements. Worker nodes 1-3 will have work load pods that will require storage from mayastore. The storeage nodes will also have work loads on them as well.

The local flag will be set to :false so that all nodes have access to the storage resources, no matter where the pod will be scheduled.

I am happy to to provide any other logs or configs that you require. I want this to work, I don't want to give up.

Daxcor69 avatar Aug 30 '22 23:08 Daxcor69

For etcd: Are you using the non-prod example yaml files for etcd? If so, did you delete the data from the nodes?

For 1.0.2 I think the local flag might be broken as the target must always live on a storage node. A WA could be to label worker nodes with the io engine label (but don't create any pools there).

In the develop branch we're in the process of removing local altogether and let the targets run on any node as long as it''s got the engine labels, decoupling them from the application nodes.

tiagolobocastro avatar Aug 31 '22 08:08 tiagolobocastro

ok, well that was an important bit of information. Is this the release that is coming in late sept?

Daxcor69 avatar Aug 31 '22 13:08 Daxcor69

This may be solved to set --feature-gates=Topology=false in the csi-provisioner when I have the same problems.

  • yaml file: csi-deployment.yaml
---
# Source: mayastor-control-plane/templates/csi-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: csi-controller
  namespace: mayastor
  labels:
    app: csi-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: csi-controller
  template:
    metadata:
      labels:
        app: csi-controller
    spec:
      hostNetwork: true
      serviceAccount: mayastor-service-account
      dnsPolicy: ClusterFirstWithHostNet
      imagePullSecrets:
        - name: regcred
      initContainers:
        - command:
          - sh
          - -c
          - trap "exit 1" TERM; until nc -vz rest 8081; do echo "Waiting for REST API endpoint
            to become available"; sleep 1; done;
          image: busybox:latest
          name: rest-probe
      containers:
        - name: csi-provisioner
          image: k8s.gcr.io/sig-storage/csi-provisioner:v2.2.1
          args:
            - "--v=2"
            - "--csi-address=$(ADDRESS)"
            - "--feature-gates=Topology=false"
            - "--strict-topology=false"
            - "--default-fstype=ext4"
          env:
            - name: ADDRESS
              value: /var/lib/csi/sockets/pluginproxy/csi.sock
          imagePullPolicy: "IfNotPresent"
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
        - name: csi-attacher
          image: k8s.gcr.io/sig-storage/csi-attacher:v3.2.1
          args:
            - "--v=2"
            - "--csi-address=$(ADDRESS)"
          env:
            - name: ADDRESS
              value: /var/lib/csi/sockets/pluginproxy/csi.sock
          imagePullPolicy: "IfNotPresent"
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
        - name: csi-controller
          resources:
            limits:
              cpu: 32m
              memory: 128Mi
            requests:
              cpu: 16m
              memory: 64Mi
          image: mayadata/mcp-csi-controller:v1.0.3
          imagePullPolicy: IfNotPresent
          args:
            - "--csi-socket=/var/lib/csi/sockets/pluginproxy/csi.sock"
            - "--rest-endpoint=http://rest:8081"
          env:
            - name: RUST_LOG
              value: info
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
      volumes:
        - name: socket-dir
          emptyDir:

mengsuenyan avatar Oct 25 '22 08:10 mengsuenyan

@Daxcor69 were you able to try the "next release"?

tiagolobocastro avatar Jan 20 '24 21:01 tiagolobocastro

No I moved on to a different solution.

Daxcor69 avatar Jan 20 '24 22:01 Daxcor69