kubevirt icon indicating copy to clipboard operation
kubevirt copied to clipboard

backend-storage: add RWO FS support

Open jean-edouard opened this issue 1 year ago • 15 comments

What this PR does

See https://github.com/kubevirt/community/pull/314

Before this PR:

After this PR:

Fixes #

Why we need it and why it was done in this way

The following tradeoffs were made:

The following alternatives were considered:

Links to places where the discussion took place:

Special notes for your reviewer

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR. Approvers are expected to review this list.

Release note

backend-storage now supports RWO FS

jean-edouard avatar Aug 19 '24 15:08 jean-edouard

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

kubevirt-bot avatar Aug 19 '24 15:08 kubevirt-bot

/test all

jean-edouard avatar Aug 19 '24 15:08 jean-edouard

/test all

jean-edouard avatar Aug 19 '24 17:08 jean-edouard

/test all

jean-edouard avatar Aug 19 '24 20:08 jean-edouard

/test all

jean-edouard avatar Aug 26 '24 18:08 jean-edouard

/test all

jean-edouard avatar Aug 27 '24 13:08 jean-edouard

/cc @acardace

jean-edouard avatar Sep 12 '24 13:09 jean-edouard

/retest

jean-edouard avatar Sep 17 '24 12:09 jean-edouard

/retest

jean-edouard avatar Oct 02 '24 21:10 jean-edouard

@jean-edouard Left another review, still some things to adjust but I think we're almost there!

acardace avatar Oct 04 '24 10:10 acardace

@dhiller is there a way to mark tests as incompatible with the flake-checker? I duplicated 4 test into sig-compute to test behavior when NFS is absent. They fail on the flake checker since it has NFS... Thanks!

jean-edouard avatar Oct 08 '24 00:10 jean-edouard

/approve Thank you @jean-edouard !

acardace avatar Oct 16 '24 07:10 acardace

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: acardace

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot avatar Oct 16 '24 07:10 kubevirt-bot

/test all

acardace avatar Oct 18 '24 07:10 acardace

@jean-edouard kudos for the great idea to add a separate label for flake-check compatibility.

I figure we need to add the label here: https://github.com/kubevirt/kubevirt/blob/main/automation/repeated_test.sh#L199

Would you be up to creating a follow up PR marking all the other tests that have been filtered out in other ways? (Since I just came back I might have that in my emails - apologies for pinging you in that case)

dhiller avatar Oct 21 '24 11:10 dhiller

@jean-edouard kudos for the great idea to add a separate label for flake-check compatibility.

I figure we need to add the label here: https://github.com/kubevirt/kubevirt/blob/main/automation/repeated_test.sh#L199

Isn't that what I already do? :)

Would you be up to creating a follow up PR marking all the other tests that have been filtered out in other ways? (Since I just came back I might have that in my emails - apologies for pinging you in that case)

Sure, do you have any pointers?

jean-edouard avatar Oct 21 '24 12:10 jean-edouard

/test pull-kubevirt-check-tests-for-flakes

fossedihelm avatar Oct 22 '24 07:10 fossedihelm

@jean-edouard kudos for the great idea to add a separate label for flake-check compatibility. I figure we need to add the label here: https://github.com/kubevirt/kubevirt/blob/main/automation/repeated_test.sh#L199

Isn't that what I already do? :)

Yeah - glancing at this PR I just missed it, you are right :blush:

Would you be up to creating a follow up PR marking all the other tests that have been filtered out in other ways? (Since I just came back I might have that in my emails - apologies for pinging you in that case)

Sure, do you have any pointers?

I was thinking about what is currently achieved with the exclude-native-ssh label but maybe @0xFelix has a better POV here about whether this makes sense. It wouldn't hurt to broadcast about this new label anyway, so that everyone knows, by i.e. adding a paragraph here : https://github.com/kubevirt/kubevirt/blob/91e5d7899442f97b2635c82bd0d446c92969dc99/tests/README.md?plain=1#L3

dhiller avatar Oct 22 '24 10:10 dhiller

Thank you for the patience @jean-edouard /lgtm

fossedihelm avatar Oct 22 '24 12:10 fossedihelm

Required labels detected, running phase 2 presubmits: /test pull-kubevirt-e2e-windows2016 /test pull-kubevirt-e2e-kind-1.30-vgpu /test pull-kubevirt-e2e-kind-sriov /test pull-kubevirt-e2e-k8s-1.30-ipv6-sig-network /test pull-kubevirt-e2e-k8s-1.29-sig-network /test pull-kubevirt-e2e-k8s-1.29-sig-storage /test pull-kubevirt-e2e-k8s-1.29-sig-compute /test pull-kubevirt-e2e-k8s-1.29-sig-operator /test pull-kubevirt-e2e-k8s-1.30-sig-network /test pull-kubevirt-e2e-k8s-1.30-sig-storage /test pull-kubevirt-e2e-k8s-1.30-sig-compute /test pull-kubevirt-e2e-k8s-1.30-sig-operator

kubevirt-commenter-bot avatar Oct 22 '24 12:10 kubevirt-commenter-bot

/retest-required

stu-gott avatar Oct 22 '24 17:10 stu-gott

@jean-edouard: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-e2e-k8s-1.30-sig-storage 60b5db4bd102f27059a24a25ee7f8423bd3f98ab link unknown /test pull-kubevirt-e2e-k8s-1.30-sig-storage
pull-kubevirt-e2e-k8s-1.31-sig-compute-migrations 60b5db4bd102f27059a24a25ee7f8423bd3f98ab link true /test pull-kubevirt-e2e-k8s-1.31-sig-compute-migrations
pull-kubevirt-e2e-k8s-1.31-sig-network 60b5db4bd102f27059a24a25ee7f8423bd3f98ab link true /test pull-kubevirt-e2e-k8s-1.31-sig-network

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

kubevirt-bot avatar Oct 22 '24 19:10 kubevirt-bot

/test pull-kubevirt-e2e-k8s-1.31-sig-network /test pull-kubevirt-e2e-k8s-1.31-sig-compute-migrations

alromeros avatar Oct 22 '24 19:10 alromeros

@jean-edouard The cache fixes are required in previous versions

xpivarc avatar Nov 01 '24 14:11 xpivarc

@jean-edouard @acardace I noticed this PR only now, I didn't expect this would be merged if there are so many disagreements on the proposal! This PR is also affected by the race mentioned in the proposal (https://issues.redhat.com/browse/CNV-50720) and https://github.com/kubevirt/kubevirt/issues/13112

vladikr avatar Nov 05 '24 14:11 vladikr

@jean-edouard could you please open an upstream issue for https://issues.redhat.com/browse/CNV-50720

vladikr avatar Nov 05 '24 14:11 vladikr

@jean-edouard could you please open an upstream issue for https://issues.redhat.com/browse/CNV-50720

Wouldn't it be basically the same as https://github.com/kubevirt/kubevirt/issues/13112 ?

jean-edouard avatar Nov 05 '24 16:11 jean-edouard

@jean-edouard could you please open an upstream issue for https://issues.redhat.com/browse/CNV-50720

Wouldn't it be basically the same as #13112 ?

It's similar, even if we fix #13112 we will still have a problem when the source and the target crash.

In #13112, the controller can observe that the target pod is happily running on the target already, so it can finalize the migration. (one possible solution)

in the case when the source and the target are gone, we don't know how to finalize the migration. We don't know which volume was updated last.

As it is today if the source pod is gone without communicating the results to the handler we cannot tell which volume to choose when the VM starts again.

vladikr avatar Nov 05 '24 18:11 vladikr

@jean-edouard could you please open an upstream issue for https://issues.redhat.com/browse/CNV-50720

Wouldn't it be basically the same as #13112 ?

It's similar, even if we fix #13112 we will still have a problem when the source and the target crash.

In #13112, the controller can observe that the target pod is happily running on the target already, so it can finalize the migration. (one possible solution)

in the case when the source and the target are gone, we don't know how to finalize the migration. We don't know which volume was updated last.

As it is today if the source pod is gone without communicating the results to the handler we cannot tell which volume to choose when the VM starts again.

@vladikr do you think we could solve this if it was possible also to query the migration status from the target libvirt? If now, we have 2 sources where we can get this information, then at least, if one is gone, we can always use the second one.

alicefr avatar Nov 19 '24 12:11 alicefr

just a reminder - we should update the user-guide regarding this.

iholder101 avatar Dec 24 '24 14:12 iholder101