containerized-data-importer icon indicating copy to clipboard operation
containerized-data-importer copied to clipboard

CDI cloneStrategy: csi-clone doesn't appear to work

Open k8scoder192 opened this issue 2 years ago • 13 comments

What happened: 1. You cannot set "cloneStrategyOverride" to "csi-clone" as it's not in allowed in the CRD (options are clone or snapshot); This needs to be fixed and made as a valid option

2. When there is no "cloneStrategyOverride" in the CDI CR, and "cloneStrategy: csi-clone" is set in the appropriate "StorageProfile", status shows "csi-clone" but when I look at what's actually happening it's performing a clone. This is verified via

kubectl get dv ubuntu22-trident-v2 -o yaml -n test1|grep -B2 -i clonet
metadata:
  annotations:
    cdi.kubevirt.io/cloneType: network     <-------------------

What you expected to happen: Setting "cloneStrategy: csi-clone" in the StoragaProfile should enable and perform a CSI clone.

FYI: I was successfully able to manually perform a csi clone via the below procedure. This leads me to believe something is wrong in the CDI logic which checks if csi clone is possible

  1. PVC-PVC clone
  2. PVC Object Transfer

How to reproduce it (as minimally and precisely as possible): 1. Ensure "cloneStrategyOverride" is NOT set in CDI CR

apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
  annotations:
    cdi.kubevirt.io/configAuthority: ""
  creationTimestamp: "2023-05-21T14:47:41Z"
  finalizers:
  - operator.cdi.kubevirt.io
  generation: 13
  name: cdi
  resourceVersion: "1502710"
  uid: 25297a88-9638-4180-906f-9d30e277ab20
spec:
  config:
    podResourceRequirements:
      limits:
        cpu: 600m
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 250Mi
  imagePullPolicy: Always
  infra:
    nodeSelector:
      kubernetes.io/os: linux
    tolerations:
    - key: CriticalAddonsOnly
      operator: Exists
  workload:
    nodeSelector:
      kubernetes.io/os: linux
status:
  conditions:
  - lastHeartbeatTime: "2023-05-21T14:49:01Z"
    lastTransitionTime: "2023-05-21T14:49:01Z"
    message: Deployment Completed
    reason: DeployCompleted
    status: "True"
    type: Available
  - lastHeartbeatTime: "2023-05-21T14:49:01Z"
    lastTransitionTime: "2023-05-21T14:49:01Z"
    status: "False"
    type: Progressing
  - lastHeartbeatTime: "2023-05-21T18:49:51Z"
    lastTransitionTime: "2023-05-21T18:49:51Z"
    status: "False"
    type: Degraded
  observedVersion: v1.54.2
  operatorVersion: v1.54.2
  phase: Deployed
  targetVersion: v1.54.2

2. Set cloneStrategy to csi-clone in the appropriate StorageProfile

apiVersion: cdi.kubevirt.io/v1beta1
kind: StorageProfile
metadata:
  creationTimestamp: "2023-05-21T14:48:50Z"
  generation: 11
  labels:
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    cdi.kubevirt.io: ""
  name: trident-csi-volume
  ownerReferences:
  - apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: CDI
    name: cdi
    uid: 25297a88-9638-4180-906f-9d30e277ab20
  resourceVersion: "1491139"
  uid: 5d2ea494-e876-4ef5-848d-269d3117b2f0
spec:
  claimPropertySets:
  - accessModes:
    - ReadWriteMany
    volumeMode: Filesystem
  cloneStrategy: csi-clone                  <-------- set 
status:
  claimPropertySets:
  - accessModes:
    - ReadWriteMany
    volumeMode: Filesystem
  cloneStrategy: csi-clone                          <----- status confirms
  provisioner: csi.trident.netapp.io
  storageClass: trident-csi-volume                     

3. Apply smartclone

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ubuntu22-trident-v2
  namespace: test1
spec:
  source:
    pvc:
      name: ubuntu22-trident
      namespace: reference-images
  storage:
    accessModes:
      - ReadWriteMany
    resources:
      requests:
        storage: 10Gi
    storageClassName: trident-csi-volume

4. Check status of clone

AME                                                  READY   STATUS              RESTARTS   AGE
pod/ad7acd07-00b1-43f6-a88d-caf3b046461f-source-pod   0/1     ContainerCreating   0          6s

NAME                                          PHASE       PROGRESS   RESTARTS   AGE
datavolume.cdi.kubevirt.io/ubuntu22-trident   Succeeded   100.0%                82m
NAME               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS         AGE
ubuntu22-trident   Bound    pvc-152d57cd-c1ec-4c34-994d-4b21eaaa0abc   10Gi       RWX            trident-csi-volume   82m                 <---- src
############################################
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME                                 READY   STATUS    RESTARTS   AGE
pod/cdi-upload-ubuntu22-trident-v2   1/1     Running   0          32s

NAME                                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/cdi-upload-ubuntu22-trident-v2   ClusterIP   192.16.21.221   <none>        443/TCP   32s

NAME                                              PHASE             PROGRESS   RESTARTS   AGE
datavolume.cdi.kubevirt.io/ubuntu22-trident-v2    CloneInProgress   N/A                   32s
NAME                    STATUS   VOLUME                                     CAPACITY      ACCESS MODES   STORAGECLASS         AGE
ubuntu22-trident-v2     Bound    pvc-ad7acd07-00b1-43f6-a88d-caf3b046461f   11362347344   RWX            trident-csi-volume   32s                   <--- dest

5. Check DV transfer type

k get dv ubuntu22-trident-v2 -o yaml -n test1|grep -B2 -A2 -i clonet
metadata:
  annotations:
    cdi.kubevirt.io/cloneType: network     <------------------- not what I exepected / not csi-clone

6. Source PVC info

k get pvc ubuntu22-trident -o yaml -n reference-images
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/storage.condition.running: "false"
    cdi.kubevirt.io/storage.condition.running.message: Import Complete
    cdi.kubevirt.io/storage.condition.running.reason: Completed
    cdi.kubevirt.io/storage.contentType: kubevirt
    cdi.kubevirt.io/storage.import.endpoint: https://build_artifacts/image/jammy-server-cloudimg-amd64.qcow2
    cdi.kubevirt.io/storage.import.importPodName: importer-ubuntu22-trident
    cdi.kubevirt.io/storage.import.secretExtraHeaders.0: artifactory-vmaas-secret
    cdi.kubevirt.io/storage.import.source: http
    cdi.kubevirt.io/storage.pod.phase: Succeeded
    cdi.kubevirt.io/storage.pod.restarts: "0"
    cdi.kubevirt.io/storage.preallocation.requested: "false"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"ubuntu22-trident","namespace":"reference-images"},"spec":{"pvc":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"10Gi"}},"storageClassName":"trident-csi-volume","volumeMode":"Filesystem"},"source":{"http":{"secretExtraHeaders":["artifactorysecret"],"url":"https://build_artifacts/image/jammy-server-cloudimg-amd64.qcow2"}}}}
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: csi.trident.netapp.io
    volume.kubernetes.io/storage-provisioner: csi.trident.netapp.io
  creationTimestamp: "2023-05-23T16:31:45Z"
  finalizers:
  - kubernetes.io/pvc-protection
  - provisioner.storage.kubernetes.io/cloning-protection
  labels:
    alerts.k8s.io/KubePersistentVolumeFillingUp: disabled
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
  name: ubuntu22-trident
  namespace: reference-images
  ownerReferences:
  - apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: DataVolume
    name: ubuntu22-trident
    uid: a2f5c307-bd9a-4f48-9fd0-db2439e7bd2a
  resourceVersion: "1493782"
  uid: 152d57cd-c1ec-4c34-994d-4b21eaaa0abc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: trident-csi-volume
  volumeMode: Filesystem
  volumeName: pvc-152d57cd-c1ec-4c34-994d-4b21eaaa0abc
status:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 10Gi
  phase: Bound

7. Destination PVC info

k get pvc ubuntu22-trident-v2 -o yaml -n test1
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/storage.clone.token: ey<redacted>SpY6uEA
    cdi.kubevirt.io/storage.condition.running: "false"
    cdi.kubevirt.io/storage.condition.running.message: Clone Complete
    cdi.kubevirt.io/storage.condition.running.reason: Completed
    cdi.kubevirt.io/storage.condition.source.running: "true"
    cdi.kubevirt.io/storage.condition.source.running.message: Clone Complete
    cdi.kubevirt.io/storage.condition.source.running.reason: Completed
    cdi.kubevirt.io/storage.contentType: kubevirt
    cdi.kubevirt.io/storage.extended.clone.token: ey<redacted>SpY6uEA
    cdi.kubevirt.io/storage.pod.phase: Succeeded
    cdi.kubevirt.io/storage.pod.ready: "false"
    cdi.kubevirt.io/storage.pod.restarts: "0"
    cdi.kubevirt.io/storage.preallocation.requested: "false"
    cdi.kubevirt.io/storage.sourceClonePodName: ad7acd07-00b1-43f6-a88d-caf3b046461f-source-pod
    cdi.kubevirt.io/storage.uploadPodName: cdi-upload-ubuntu22-trident-v2
    cdi.kubevirt.io/uploadClientName: reference-images/ubuntu22-trident-test1/ubuntu22-trident-v2
    k8s.io/CloneOf: "true"
    k8s.io/CloneRequest: reference-images/ubuntu22-trident
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"ubuntu22-trident-v2","namespace":"test1"},"spec":{"source":{"pvc":{"name":"ubuntu22-trident","namespace":"reference-images"}},"storage":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"10Gi"}},"storageClassName":"trident-csi-volume"}}}
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: csi.trident.netapp.io
    volume.kubernetes.io/storage-provisioner: csi.trident.netapp.io
  creationTimestamp: "2023-05-23T17:53:46Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    alerts.k8s.io/KubePersistentVolumeFillingUp: disabled
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
  name: ubuntu22-trident-v2
  namespace: test1
  ownerReferences:
  - apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: DataVolume
    name: ubuntu22-trident-v2
    uid: 6c0b6544-592f-49d8-94fd-8d37a2fd6e87
  resourceVersion: "1504966"
  uid: ad7acd07-00b1-43f6-a88d-caf3b046461f
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: "11362347344"
  storageClassName: trident-csi-volume
  volumeMode: Filesystem
  volumeName: pvc-ad7acd07-00b1-43f6-a88d-caf3b046461f
status:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: "11362347344"              <------ note doesn't match Spec size of 10Gi either 
  phase: Bound

Additional context:

NetApp Trident CSI storage class

 k get sc trident-csi-volume -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2023-05-20T17:01:28Z"
  name: trident-csi-volume
  resourceVersion: "3155"
  uid: <redacted>
parameters:
  backendType: ontap-nas
  fsType: __FILESYSTEM_TYPE__
  storagePools: <redacted>
provisioner: csi.trident.netapp.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

VolumeMode is Filesystem for both src and dst

accessModes is RWX for both src and dest

Pod cdi-deployment log (partial) pod_cdi-deployment.txt

Environment:

  • CDI version (use kubectl get deployments cdi-deployment -o yaml): v1.54.2
  • Kubernetes version (use kubectl version): v1.23.10
  • DV specification: N/A
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): Ubuntu Jammy
  • Kernel (e.g. uname -a): 5.15.0.xxx
  • Install tools: N/A
  • Others: N/A

k8scoder192 avatar May 23 '23 19:05 k8scoder192

The global override is to force an advanced clone to a copy clone even if it possible to do the advanced clone. You can just not specify it and you will be fine in most cases.

CDI does several checks to see if it can perform an advanced clone:

  1. Ensure a StorageProfile is defined and populated for the particular storage. netapp is a known storage provider and thus the profile should be populated automatically.
  2. Ensure a CSIDriver resource exists and the name of the driver matches the provisioner string in the storage class you are using. (if not fall back to copy clone)
  3. Ensure the source and target storage classes match (if not fall back to copy clone)
  4. Ensure the source and target volume modes match (if not fall back to copy clone)
  5. Ensure the target size >= source size. If the target size is > source size, ensure the storage class has expand volume enabled (if not fall back to copy clone). We do this due to some provisioners not allowing a clone into a size > the source. So we essentially make a clone that is the size of the source, and then expand into the target size.

Looking at what you provided, the storage class doesn't have allowExpandVolume set to true, and thus CDI is falling back to copy clone.

copy clone is also known as network clone or host assisted clone. But essentially we create a source and target pod and copy the bytes over the network.

If you change the log level in the CDI operator to 3 or higher, the cdi deployment logs should have the exact reason why it is not doing an advanced clone in there.

awels avatar May 24 '23 14:05 awels

@awels looking at my example above, the request SRC = DST size

DV clone

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ubuntu22-trident-v2
  namespace: test1
spec:
  source:
    pvc:
      name: ubuntu22-trident
      namespace: reference-images
  storage:
    accessModes:
      - ReadWriteMany
    resources:
      requests:
        storage: 10Gi            <-----------------
    storageClassName: trident-csi-volume

Here is the Source PVC (also 10Gi)

ubuntu22-trident   Bound    pvc-152d57cd-c1ec-4c34-994d-4b21eaaa0abc   10Gi       RWX            trident-csi-volume   82m                 <---- src

Given that, allowExpandVolume (point 5 you mentioned) isn't needed to be enabled. Now, for whatever reason, CDI decided to use "network clone" even though src=dest and all other points (1-4) were met, meaning cdi-clone should work. I also have shown in my 1st post cdi-clone works when done manually (pvc-pvc clone, then pvc object transfer) .

Lastly, and just aside note, "network clone" seems to make the sightly dest larger, which I'm not sure why.

k8scoder192 avatar May 25 '23 16:05 k8scoder192

Lastly, and just aside note, "network clone" seems to make the sightly dest larger, which I'm not sure why.

So using the DV storage API (dv.spec.storage) we inflate the size according to filesystem overhead configured in CDI config. This is why you end up with a slightly bigger PVC for the target https://github.com/kubevirt/containerized-data-importer/blob/c7467cc5fd71f98d89d681dddc6ad79631ee437a/doc/datavolumes.md?plain=1#L432-L434

If you use the storage API for the source DV as well, it will end up with the same size as the target (10.5~ in your case)

akalenyu avatar May 30 '23 18:05 akalenyu

@akalenyu Understood, thanks for the clarification. However, as mentioned I see no reason why CDI reverted to using network cloning. I am able to do csi-clone manually with no issues, as reported above in the ticket.

k8scoder192 avatar May 30 '23 18:05 k8scoder192

@akalenyu Understood, thanks for the clarification. However, as mentioned I see no reason why CDI reverted to using network cloning. I am able to do csi-clone manually with no issues, as reported above in the ticket.

Yes, I think you are right, will try to see if we can be more flexible about not falling back to network clone in such case

akalenyu avatar May 30 '23 19:05 akalenyu

Again, I think the reason it reverted to network clone is because it used the fsOverhead to increase the size of the target, now the target is > than the source, and it doesn't see expandVolume, and thus rejects the csi clone. If you want it to not expand you have a few options:

  1. Don't just the storage stanza, and instead use pvc like this:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ubuntu22-trident-v2
  namespace: test1
spec:
  source:
    pvc:
      name: ubuntu22-trident
      namespace: reference-images
  pvc:
    accessModes:
      - ReadWriteMany
    resources:
      requests:
        storage: 10Gi
    storageClassName: trident-csi-volume

This will bypass all the logic in CDI that attempts to autofill values for the created PVC, and just use the passed in PVC as is. Since you are specifying everything anyway, it might make sense to do that. 2. Let CDI figure everything out for you if you want to make a 1 to 1 clone of a volume like this:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ubuntu22-trident-v2
  namespace: test1
spec:
  source:
    pvc:
      name: ubuntu22-trident
      namespace: reference-images
  storage:
    storageClassName: trident-csi-volume

In this case CDI will figure out the volumeMode,accessMode and size from the source and storageProfile, and create an appropriate PVC that will make an exact 1 to 1 copy (using the best available method).

And again, if you get the logging to v=3 or higher, the cdi deployment logs will contain the exact reason why CDI decided to not use csi clone.

awels avatar May 31 '23 13:05 awels

So summing up the issue that remains here, please feel free to correct me @k8scoder192: In the CSI clone case, we could just proceed with the size mismatch between src and target, and have the CSI driver deal with that as it sees fit. Falling back to host-assisted clone is rather surprising and as can be seen in the issue was not necessary (This operation worked manually)

akalenyu avatar Jun 19 '23 12:06 akalenyu

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Sep 17 '23 12:09 kubevirt-bot

/remove-lifecycle stale

akalenyu avatar Sep 18 '23 08:09 akalenyu

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Dec 17 '23 08:12 kubevirt-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubevirt-bot avatar Jan 16 '24 08:01 kubevirt-bot

/remove-lifecycle rotten

alromeros avatar Jan 16 '24 11:01 alromeros

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Apr 15 '24 11:04 kubevirt-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubevirt-bot avatar May 15 '24 11:05 kubevirt-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubevirt-bot avatar Jun 14 '24 12:06 kubevirt-bot

@kubevirt-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kubevirt-bot avatar Jun 14 '24 12:06 kubevirt-bot