containerized-data-importer icon indicating copy to clipboard operation
containerized-data-importer copied to clipboard

Host assisted clone should adjust requested storage when cloning from block to filesystem volume mode (#3900)

Open gaohoward opened this issue 3 months ago • 27 comments

When cloning from a source, the host cloner should check the requested size for the target tmp pvc if the tmp source pvc is of block volume mode and the target pvc is of filesystem volume mode because it should take filesystem overhead into consideration and enlarge the original request size accordingly before creating the tmp pvc.

Because currently the check is missing the cloning will fail on validation.

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #

https://issues.redhat.com/browse/CNV-44140

Special notes for your reviewer:

Release note:

Fix: Host assisted clone should adjust requested storage when cloning from block to filesystem volume mode

gaohoward avatar Sep 10 '25 13:09 gaohoward

Hi @gaohoward. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kubevirt-bot avatar Sep 10 '25 13:09 kubevirt-bot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign aglitke for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot avatar Sep 10 '25 13:09 kubevirt-bot

/test all

akalenyu avatar Sep 11 '25 07:09 akalenyu

Coverage Status

coverage: 58.65% (+0.003%) from 58.647% when pulling 86651108a33762b1234a0a9230b1ee647e49a32f on gaohoward:cnv-44140 into 094020bca66a12babc8cfac0d303963f541a40c8 on kubevirt:main.

coveralls avatar Sep 11 '25 07:09 coveralls

I am probably missing something, but didn't https://github.com/kubevirt/containerized-data-importer/pull/3384 fix this particular issue? Or did I miss a particular case with that PR?

awels avatar Oct 13 '25 13:10 awels

I am probably missing something, but didn't #3384 fix this particular issue? Or did I miss a particular case with that PR?

Yeah I think it goes a different path. In this case it falls back to host-clone from smart-clone (supported by provisioner), and in that case it didn't take the fs overhead into account.

gaohoward avatar Oct 13 '25 14:10 gaohoward

Gotcha, okay, let me know if you need a review on the PR.

awels avatar Oct 13 '25 14:10 awels

Gotcha, okay, let me know if you need a review on the PR.

Yes pleas review. Thanks @awels !

gaohoward avatar Oct 14 '25 08:10 gaohoward

/retest

gaohoward avatar Oct 23 '25 09:10 gaohoward

/retest

gaohoward avatar Oct 23 '25 14:10 gaohoward

/retest

gaohoward avatar Oct 27 '25 01:10 gaohoward

/retest

gaohoward avatar Nov 06 '25 15:11 gaohoward

/retest

gaohoward avatar Nov 11 '25 13:11 gaohoward

@awels Hi Alexander, I finally get the tests pass, can you please take a look again?

Thanks

gaohoward avatar Nov 13 '25 12:11 gaohoward

/retest

gaohoward avatar Nov 17 '25 13:11 gaohoward

/retest

gaohoward avatar Nov 17 '25 14:11 gaohoward

/retest

gaohoward avatar Dec 02 '25 07:12 gaohoward

/retest

gaohoward avatar Dec 03 '25 03:12 gaohoward

/retest

gaohoward avatar Dec 08 '25 14:12 gaohoward

Thank you for tackling this! I would appreciate a link to the bug so I can do some testing and see the origin of the issue.

I am a little confused that we have to inflate in the populator layer; usually we do this in the mutating pvc webhook (on by default)/dv controllers

Here you are: https://issues.redhat.com/browse/CNV-44140

Yes the issue happens at the very late stage where the snapshot clone falls back to host assisted clone.

gaohoward avatar Dec 11 '25 02:12 gaohoward

Thank you for tackling this! I would appreciate a link to the bug so I can do some testing and see the origin of the issue. I am a little confused that we have to inflate in the populator layer; usually we do this in the mutating pvc webhook (on by default)/dv controllers

Here you are: https://issues.redhat.com/browse/CNV-44140

Yes the issue happens at the very late stage where the snapshot clone falls back to host assisted clone.

I see, though I am still confused by inflating this far into the process. Doesn't the target PVC get inflated anyhow by the webhook/DV controller (and thus so does the temp)?

akalenyu avatar Dec 11 '25 17:12 akalenyu

Also note that bug describes a separate issue that may still exist

$ oc create  -f dv-hpp-cloned-from-datasource.yaml
Error from server: error when creating "dv-hpp-cloned-from-datasource.yaml": admission webhook "datavolume-validate.cdi.kubevirt.io" denied the request:  Storage size is missing

akalenyu avatar Dec 11 '25 17:12 akalenyu

/cc @jpeimer

akalenyu avatar Dec 11 '25 17:12 akalenyu

@akalenyu: GitHub didn't allow me to request PR reviews from the following users: jpeimer.

Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @jpeimer

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kubevirt-bot avatar Dec 11 '25 17:12 kubevirt-bot

Thank you for tackling this! I would appreciate a link to the bug so I can do some testing and see the origin of the issue. I am a little confused that we have to inflate in the populator layer; usually we do this in the mutating pvc webhook (on by default)/dv controllers

Here you are: https://issues.redhat.com/browse/CNV-44140 Yes the issue happens at the very late stage where the snapshot clone falls back to host assisted clone.

I see, though I am still confused by inflating this far into the process. Doesn't the target PVC get inflated anyhow by the webhook/DV controller (and thus so does the temp)?

No the target PVC doesn't get inflated when doing snapshot clone. When the controller plans the clone it finds out the storage class is different from the volume snapshot's one and then it fall back to host assisted clone. When doing host clone phase it create the tmp target PVC, if we don't check here then it will report the failure because the target size is not enough (for this to happen the tmp target PVC volume mode must be filesystem).

gaohoward avatar Dec 12 '25 10:12 gaohoward

dv-hpp-cloned-from-datasource.yaml

It doesn't happen for me on upstream. Can you share your yaml file?

gaohoward avatar Dec 12 '25 10:12 gaohoward

dv-hpp-cloned-from-datasource.yaml

It doesn't happen for me on upstream. Can you share your yaml file?

Yeah I don't see it either, may have to do with an older version

Thank you for tackling this! I would appreciate a link to the bug so I can do some testing and see the origin of the issue. I am a little confused that we have to inflate in the populator layer; usually we do this in the mutating pvc webhook (on by default)/dv controllers

Here you are: https://issues.redhat.com/browse/CNV-44140 Yes the issue happens at the very late stage where the snapshot clone falls back to host assisted clone.

I see, though I am still confused by inflating this far into the process. Doesn't the target PVC get inflated anyhow by the webhook/DV controller (and thus so does the temp)?

No the target PVC doesn't get inflated when doing snapshot clone. When the controller plans the clone it finds out the storage class is different from the volume snapshot's one and then it fall back to host assisted clone. When doing host clone phase it create the tmp target PVC, if we don't check here then it will report the failure because the target size is not enough (for this to happen the tmp target PVC volume mode must be filesystem).

I was pretty sure we inflate filesystem target PVCs here - https://github.com/kubevirt/containerized-data-importer/blob/e147bbdf17252d9bd3b29738cf12c0734c9e6884/pkg/controller/datavolume/util.go#L314 Sorry for poking I just think we have a problem elsewhere and that inflating the temp PVC in the populator will just cover that up

akalenyu avatar Dec 16 '25 13:12 akalenyu

dv-hpp-cloned-from-datasource.yaml

It doesn't happen for me on upstream. Can you share your yaml file?

Yeah I don't see it either, may have to do with an older version

Thank you for tackling this! I would appreciate a link to the bug so I can do some testing and see the origin of the issue. I am a little confused that we have to inflate in the populator layer; usually we do this in the mutating pvc webhook (on by default)/dv controllers

Here you are: https://issues.redhat.com/browse/CNV-44140 Yes the issue happens at the very late stage where the snapshot clone falls back to host assisted clone.

I see, though I am still confused by inflating this far into the process. Doesn't the target PVC get inflated anyhow by the webhook/DV controller (and thus so does the temp)?

No the target PVC doesn't get inflated when doing snapshot clone. When the controller plans the clone it finds out the storage class is different from the volume snapshot's one and then it fall back to host assisted clone. When doing host clone phase it create the tmp target PVC, if we don't check here then it will report the failure because the target size is not enough (for this to happen the tmp target PVC volume mode must be filesystem).

I was pretty sure we inflate filesystem target PVCs here -

https://github.com/kubevirt/containerized-data-importer/blob/e147bbdf17252d9bd3b29738cf12c0734c9e6884/pkg/controller/datavolume/util.go#L314 Sorry for poking I just think we have a problem elsewhere and that inflating the target in the populator will just cover that up

This method is never called in this case. Actually the webhook rendering is not enabled. For snapshot cloning it basically fallback to host-assisted during the planning phase, and when the host-clone phase is handling the temp target pvc creation, it has a chance to make sure the target size is right (inflate if necessary).

gaohoward avatar Dec 16 '25 14:12 gaohoward

dv-hpp-cloned-from-datasource.yaml

It doesn't happen for me on upstream. Can you share your yaml file?

Yeah I don't see it either, may have to do with an older version

Thank you for tackling this! I would appreciate a link to the bug so I can do some testing and see the origin of the issue. I am a little confused that we have to inflate in the populator layer; usually we do this in the mutating pvc webhook (on by default)/dv controllers

Here you are: https://issues.redhat.com/browse/CNV-44140 Yes the issue happens at the very late stage where the snapshot clone falls back to host assisted clone.

I see, though I am still confused by inflating this far into the process. Doesn't the target PVC get inflated anyhow by the webhook/DV controller (and thus so does the temp)?

No the target PVC doesn't get inflated when doing snapshot clone. When the controller plans the clone it finds out the storage class is different from the volume snapshot's one and then it fall back to host assisted clone. When doing host clone phase it create the tmp target PVC, if we don't check here then it will report the failure because the target size is not enough (for this to happen the tmp target PVC volume mode must be filesystem).

I was pretty sure we inflate filesystem target PVCs here - https://github.com/kubevirt/containerized-data-importer/blob/e147bbdf17252d9bd3b29738cf12c0734c9e6884/pkg/controller/datavolume/util.go#L314

Sorry for poking I just think we have a problem elsewhere and that inflating the target in the populator will just cover that up

This method is never called in this case. Actually the webhook rendering is not enabled. For snapshot cloning it basically fallback to host-assisted during the planning phase, and when the host-clone phase is handling the temp target pvc creation, it has a chance to make sure the target size is right (inflate if necessary).

Webhook rendering is enabled since a few releases now, but anyway renderPvcSpecVolumeSize existed prior to it for the DV storage API. How come the target doesn't get inflated?

akalenyu avatar Dec 16 '25 15:12 akalenyu

I think the real issue here is that we don't inflate the PVC during mutation if it has a data source: https://github.com/kubevirt/containerized-data-importer/blob/e1bfde033dc8c83ecdbfea0e555f35f42704c03a/pkg/controller/datavolume/util.go#L65-L88

https://github.com/kubevirt/containerized-data-importer/blob/e1bfde033dc8c83ecdbfea0e555f35f42704c03a/pkg/controller/datavolume/util.go#L246-L255

We simply set the restore size if it's available and leave it as is otherwise.

Acedus avatar Dec 17 '25 12:12 Acedus