containerized-data-importer
containerized-data-importer copied to clipboard
CDI cloneStrategy: csi-clone doesn't appear to work
What happened: 1. You cannot set "cloneStrategyOverride" to "csi-clone" as it's not in allowed in the CRD (options are clone or snapshot); This needs to be fixed and made as a valid option
2. When there is no "cloneStrategyOverride" in the CDI CR, and "cloneStrategy: csi-clone" is set in the appropriate "StorageProfile", status shows "csi-clone" but when I look at what's actually happening it's performing a clone. This is verified via
kubectl get dv ubuntu22-trident-v2 -o yaml -n test1|grep -B2 -i clonet
metadata:
annotations:
cdi.kubevirt.io/cloneType: network <-------------------
What you expected to happen: Setting "cloneStrategy: csi-clone" in the StoragaProfile should enable and perform a CSI clone.
FYI: I was successfully able to manually perform a csi clone via the below procedure. This leads me to believe something is wrong in the CDI logic which checks if csi clone is possible
- PVC-PVC clone
- PVC Object Transfer
How to reproduce it (as minimally and precisely as possible): 1. Ensure "cloneStrategyOverride" is NOT set in CDI CR
apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
annotations:
cdi.kubevirt.io/configAuthority: ""
creationTimestamp: "2023-05-21T14:47:41Z"
finalizers:
- operator.cdi.kubevirt.io
generation: 13
name: cdi
resourceVersion: "1502710"
uid: 25297a88-9638-4180-906f-9d30e277ab20
spec:
config:
podResourceRequirements:
limits:
cpu: 600m
memory: 1Gi
requests:
cpu: 300m
memory: 250Mi
imagePullPolicy: Always
infra:
nodeSelector:
kubernetes.io/os: linux
tolerations:
- key: CriticalAddonsOnly
operator: Exists
workload:
nodeSelector:
kubernetes.io/os: linux
status:
conditions:
- lastHeartbeatTime: "2023-05-21T14:49:01Z"
lastTransitionTime: "2023-05-21T14:49:01Z"
message: Deployment Completed
reason: DeployCompleted
status: "True"
type: Available
- lastHeartbeatTime: "2023-05-21T14:49:01Z"
lastTransitionTime: "2023-05-21T14:49:01Z"
status: "False"
type: Progressing
- lastHeartbeatTime: "2023-05-21T18:49:51Z"
lastTransitionTime: "2023-05-21T18:49:51Z"
status: "False"
type: Degraded
observedVersion: v1.54.2
operatorVersion: v1.54.2
phase: Deployed
targetVersion: v1.54.2
2. Set cloneStrategy to csi-clone in the appropriate StorageProfile
apiVersion: cdi.kubevirt.io/v1beta1
kind: StorageProfile
metadata:
creationTimestamp: "2023-05-21T14:48:50Z"
generation: 11
labels:
app: containerized-data-importer
app.kubernetes.io/component: storage
app.kubernetes.io/managed-by: cdi-controller
cdi.kubevirt.io: ""
name: trident-csi-volume
ownerReferences:
- apiVersion: cdi.kubevirt.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: CDI
name: cdi
uid: 25297a88-9638-4180-906f-9d30e277ab20
resourceVersion: "1491139"
uid: 5d2ea494-e876-4ef5-848d-269d3117b2f0
spec:
claimPropertySets:
- accessModes:
- ReadWriteMany
volumeMode: Filesystem
cloneStrategy: csi-clone <-------- set
status:
claimPropertySets:
- accessModes:
- ReadWriteMany
volumeMode: Filesystem
cloneStrategy: csi-clone <----- status confirms
provisioner: csi.trident.netapp.io
storageClass: trident-csi-volume
3. Apply smartclone
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: ubuntu22-trident-v2
namespace: test1
spec:
source:
pvc:
name: ubuntu22-trident
namespace: reference-images
storage:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: trident-csi-volume
4. Check status of clone
AME READY STATUS RESTARTS AGE
pod/ad7acd07-00b1-43f6-a88d-caf3b046461f-source-pod 0/1 ContainerCreating 0 6s
NAME PHASE PROGRESS RESTARTS AGE
datavolume.cdi.kubevirt.io/ubuntu22-trident Succeeded 100.0% 82m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ubuntu22-trident Bound pvc-152d57cd-c1ec-4c34-994d-4b21eaaa0abc 10Gi RWX trident-csi-volume 82m <---- src
############################################
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME READY STATUS RESTARTS AGE
pod/cdi-upload-ubuntu22-trident-v2 1/1 Running 0 32s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cdi-upload-ubuntu22-trident-v2 ClusterIP 192.16.21.221 <none> 443/TCP 32s
NAME PHASE PROGRESS RESTARTS AGE
datavolume.cdi.kubevirt.io/ubuntu22-trident-v2 CloneInProgress N/A 32s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ubuntu22-trident-v2 Bound pvc-ad7acd07-00b1-43f6-a88d-caf3b046461f 11362347344 RWX trident-csi-volume 32s <--- dest
5. Check DV transfer type
k get dv ubuntu22-trident-v2 -o yaml -n test1|grep -B2 -A2 -i clonet
metadata:
annotations:
cdi.kubevirt.io/cloneType: network <------------------- not what I exepected / not csi-clone
6. Source PVC info
k get pvc ubuntu22-trident -o yaml -n reference-images
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
cdi.kubevirt.io/storage.condition.running: "false"
cdi.kubevirt.io/storage.condition.running.message: Import Complete
cdi.kubevirt.io/storage.condition.running.reason: Completed
cdi.kubevirt.io/storage.contentType: kubevirt
cdi.kubevirt.io/storage.import.endpoint: https://build_artifacts/image/jammy-server-cloudimg-amd64.qcow2
cdi.kubevirt.io/storage.import.importPodName: importer-ubuntu22-trident
cdi.kubevirt.io/storage.import.secretExtraHeaders.0: artifactory-vmaas-secret
cdi.kubevirt.io/storage.import.source: http
cdi.kubevirt.io/storage.pod.phase: Succeeded
cdi.kubevirt.io/storage.pod.restarts: "0"
cdi.kubevirt.io/storage.preallocation.requested: "false"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"ubuntu22-trident","namespace":"reference-images"},"spec":{"pvc":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"10Gi"}},"storageClassName":"trident-csi-volume","volumeMode":"Filesystem"},"source":{"http":{"secretExtraHeaders":["artifactorysecret"],"url":"https://build_artifacts/image/jammy-server-cloudimg-amd64.qcow2"}}}}
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: csi.trident.netapp.io
volume.kubernetes.io/storage-provisioner: csi.trident.netapp.io
creationTimestamp: "2023-05-23T16:31:45Z"
finalizers:
- kubernetes.io/pvc-protection
- provisioner.storage.kubernetes.io/cloning-protection
labels:
alerts.k8s.io/KubePersistentVolumeFillingUp: disabled
app: containerized-data-importer
app.kubernetes.io/component: storage
app.kubernetes.io/managed-by: cdi-controller
name: ubuntu22-trident
namespace: reference-images
ownerReferences:
- apiVersion: cdi.kubevirt.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: DataVolume
name: ubuntu22-trident
uid: a2f5c307-bd9a-4f48-9fd0-db2439e7bd2a
resourceVersion: "1493782"
uid: 152d57cd-c1ec-4c34-994d-4b21eaaa0abc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: trident-csi-volume
volumeMode: Filesystem
volumeName: pvc-152d57cd-c1ec-4c34-994d-4b21eaaa0abc
status:
accessModes:
- ReadWriteMany
capacity:
storage: 10Gi
phase: Bound
7. Destination PVC info
k get pvc ubuntu22-trident-v2 -o yaml -n test1
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
cdi.kubevirt.io/storage.clone.token: ey<redacted>SpY6uEA
cdi.kubevirt.io/storage.condition.running: "false"
cdi.kubevirt.io/storage.condition.running.message: Clone Complete
cdi.kubevirt.io/storage.condition.running.reason: Completed
cdi.kubevirt.io/storage.condition.source.running: "true"
cdi.kubevirt.io/storage.condition.source.running.message: Clone Complete
cdi.kubevirt.io/storage.condition.source.running.reason: Completed
cdi.kubevirt.io/storage.contentType: kubevirt
cdi.kubevirt.io/storage.extended.clone.token: ey<redacted>SpY6uEA
cdi.kubevirt.io/storage.pod.phase: Succeeded
cdi.kubevirt.io/storage.pod.ready: "false"
cdi.kubevirt.io/storage.pod.restarts: "0"
cdi.kubevirt.io/storage.preallocation.requested: "false"
cdi.kubevirt.io/storage.sourceClonePodName: ad7acd07-00b1-43f6-a88d-caf3b046461f-source-pod
cdi.kubevirt.io/storage.uploadPodName: cdi-upload-ubuntu22-trident-v2
cdi.kubevirt.io/uploadClientName: reference-images/ubuntu22-trident-test1/ubuntu22-trident-v2
k8s.io/CloneOf: "true"
k8s.io/CloneRequest: reference-images/ubuntu22-trident
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"ubuntu22-trident-v2","namespace":"test1"},"spec":{"source":{"pvc":{"name":"ubuntu22-trident","namespace":"reference-images"}},"storage":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"10Gi"}},"storageClassName":"trident-csi-volume"}}}
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: csi.trident.netapp.io
volume.kubernetes.io/storage-provisioner: csi.trident.netapp.io
creationTimestamp: "2023-05-23T17:53:46Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
alerts.k8s.io/KubePersistentVolumeFillingUp: disabled
app: containerized-data-importer
app.kubernetes.io/component: storage
app.kubernetes.io/managed-by: cdi-controller
name: ubuntu22-trident-v2
namespace: test1
ownerReferences:
- apiVersion: cdi.kubevirt.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: DataVolume
name: ubuntu22-trident-v2
uid: 6c0b6544-592f-49d8-94fd-8d37a2fd6e87
resourceVersion: "1504966"
uid: ad7acd07-00b1-43f6-a88d-caf3b046461f
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: "11362347344"
storageClassName: trident-csi-volume
volumeMode: Filesystem
volumeName: pvc-ad7acd07-00b1-43f6-a88d-caf3b046461f
status:
accessModes:
- ReadWriteMany
capacity:
storage: "11362347344" <------ note doesn't match Spec size of 10Gi either
phase: Bound
Additional context:
NetApp Trident CSI storage class
k get sc trident-csi-volume -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: "2023-05-20T17:01:28Z"
name: trident-csi-volume
resourceVersion: "3155"
uid: <redacted>
parameters:
backendType: ontap-nas
fsType: __FILESYSTEM_TYPE__
storagePools: <redacted>
provisioner: csi.trident.netapp.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
VolumeMode is Filesystem for both src and dst
accessModes is RWX for both src and dest
Pod cdi-deployment log (partial) pod_cdi-deployment.txt
Environment:
- CDI version (use
kubectl get deployments cdi-deployment -o yaml): v1.54.2 - Kubernetes version (use
kubectl version): v1.23.10 - DV specification: N/A
- Cloud provider or hardware configuration: N/A
- OS (e.g. from /etc/os-release): Ubuntu Jammy
- Kernel (e.g.
uname -a): 5.15.0.xxx - Install tools: N/A
- Others: N/A
The global override is to force an advanced clone to a copy clone even if it possible to do the advanced clone. You can just not specify it and you will be fine in most cases.
CDI does several checks to see if it can perform an advanced clone:
- Ensure a StorageProfile is defined and populated for the particular storage. netapp is a known storage provider and thus the profile should be populated automatically.
- Ensure a CSIDriver resource exists and the name of the driver matches the provisioner string in the storage class you are using. (if not fall back to copy clone)
- Ensure the source and target storage classes match (if not fall back to copy clone)
- Ensure the source and target volume modes match (if not fall back to copy clone)
- Ensure the target size >= source size. If the target size is > source size, ensure the storage class has expand volume enabled (if not fall back to copy clone). We do this due to some provisioners not allowing a clone into a size > the source. So we essentially make a clone that is the size of the source, and then expand into the target size.
Looking at what you provided, the storage class doesn't have allowExpandVolume set to true, and thus CDI is falling back to copy clone.
copy clone is also known as network clone or host assisted clone. But essentially we create a source and target pod and copy the bytes over the network.
If you change the log level in the CDI operator to 3 or higher, the cdi deployment logs should have the exact reason why it is not doing an advanced clone in there.
@awels looking at my example above, the request SRC = DST size
DV clone
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: ubuntu22-trident-v2
namespace: test1
spec:
source:
pvc:
name: ubuntu22-trident
namespace: reference-images
storage:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi <-----------------
storageClassName: trident-csi-volume
Here is the Source PVC (also 10Gi)
ubuntu22-trident Bound pvc-152d57cd-c1ec-4c34-994d-4b21eaaa0abc 10Gi RWX trident-csi-volume 82m <---- src
Given that, allowExpandVolume (point 5 you mentioned) isn't needed to be enabled. Now, for whatever reason, CDI decided to use "network clone" even though src=dest and all other points (1-4) were met, meaning cdi-clone should work. I also have shown in my 1st post cdi-clone works when done manually (pvc-pvc clone, then pvc object transfer) .
Lastly, and just aside note, "network clone" seems to make the sightly dest larger, which I'm not sure why.
Lastly, and just aside note, "network clone" seems to make the sightly dest larger, which I'm not sure why.
So using the DV storage API (dv.spec.storage) we inflate the size according to filesystem overhead configured in CDI config.
This is why you end up with a slightly bigger PVC for the target
https://github.com/kubevirt/containerized-data-importer/blob/c7467cc5fd71f98d89d681dddc6ad79631ee437a/doc/datavolumes.md?plain=1#L432-L434
If you use the storage API for the source DV as well, it will end up with the same size as the target (10.5~ in your case)
@akalenyu Understood, thanks for the clarification. However, as mentioned I see no reason why CDI reverted to using network cloning. I am able to do csi-clone manually with no issues, as reported above in the ticket.
@akalenyu Understood, thanks for the clarification. However, as mentioned I see no reason why CDI reverted to using network cloning. I am able to do csi-clone manually with no issues, as reported above in the ticket.
Yes, I think you are right, will try to see if we can be more flexible about not falling back to network clone in such case
Again, I think the reason it reverted to network clone is because it used the fsOverhead to increase the size of the target, now the target is > than the source, and it doesn't see expandVolume, and thus rejects the csi clone. If you want it to not expand you have a few options:
- Don't just the storage stanza, and instead use pvc like this:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: ubuntu22-trident-v2
namespace: test1
spec:
source:
pvc:
name: ubuntu22-trident
namespace: reference-images
pvc:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: trident-csi-volume
This will bypass all the logic in CDI that attempts to autofill values for the created PVC, and just use the passed in PVC as is. Since you are specifying everything anyway, it might make sense to do that. 2. Let CDI figure everything out for you if you want to make a 1 to 1 clone of a volume like this:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: ubuntu22-trident-v2
namespace: test1
spec:
source:
pvc:
name: ubuntu22-trident
namespace: reference-images
storage:
storageClassName: trident-csi-volume
In this case CDI will figure out the volumeMode,accessMode and size from the source and storageProfile, and create an appropriate PVC that will make an exact 1 to 1 copy (using the best available method).
And again, if you get the logging to v=3 or higher, the cdi deployment logs will contain the exact reason why CDI decided to not use csi clone.
So summing up the issue that remains here, please feel free to correct me @k8scoder192: In the CSI clone case, we could just proceed with the size mismatch between src and target, and have the CSI driver deal with that as it sees fit. Falling back to host-assisted clone is rather surprising and as can be seen in the issue was not necessary (This operation worked manually)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle rotten
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
/close
@kubevirt-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen. Mark the issue as fresh with/remove-lifecycle rotten./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.