eve icon indicating copy to clipboard operation
eve copied to clipboard

Enable Node Convert to base k3s.

Open andrewd-zededa opened this issue 7 months ago • 4 comments

Description

The goal of this PR is to enable a base-k3s mode of HV=kubevirt EVE-OS. The existing mode of HV=kubevirt is that after first-boot the k3s node will contain installed a series of components to enable running virtual machines and redundant storage. With this PR, the controller sends config declaring that EVE should change to a base-k3s mode/state. This PR implements the detection and conversion-to this new mode.

  • Move component install steps to clean functions
  • Uninstall components if registration completes. This marks conversion from edge node storage cluster which provides VMs on replicated storage to a simpler kubernetes cluster awaiting further config. The original configuration is restored later with the existing convert-to-single-node path when EdgeNodeClusterConfig us unpublished.
  • Enable all_components_initialized on arm64 by skipping install of kubevirt/cdi components. CDI is amd64 only at least in this version.
  • Skip calling longhorn API when it's not expected to be available. This is used in pillar's node drain path as well as replicated volume instance metrics. Search for the longhorn-system namespace to determine if the longhorn http api is expected to be available. The kubernetes API server should always be available in HV=kubevirt. For reference the node drain path is documented in: https://github.com/lf-edge/eve/blob/master/pkg/pillar/docs/zedkube.md#kubernetes-node-draining

PR dependencies

https://github.com/lf-edge/eve/pull/4870

How to test and validate this PR

  • Configure an EVE controller to create a cluster of EVE-OS nodes, and generate a registration manifest.
  • eve enter kube
  • Verify the correct registration deployments are running on the cluster.
  • Verify the longhorn-system, kubevirt, cdi, multus namespaces are cleaned up.

Changelog notes

PR Backports

Checklist

  • [x] I've provided a proper description
  • [ ] I've added the proper documentation (when applicable)
  • [x] I've tested my PR on amd64 device(s)
  • [ ] I've tested my PR on arm64 device(s)
  • [x] I've written the test verification instructions
  • [ ] I've set the proper labels to this PR

andrewd-zededa avatar Jun 11 '25 03:06 andrewd-zededa

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 18.56%. Comparing base (fd35355) to head (250e056). Report is 19 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4954      +/-   ##
==========================================
- Coverage   24.97%   18.56%   -6.42%     
==========================================
  Files           8       18      +10     
  Lines        1185     2613    +1428     
==========================================
+ Hits          296      485     +189     
- Misses        820     2046    +1226     
- Partials       69       82      +13     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jun 17 '25 23:06 codecov[bot]

Amended the commit to add:

  • missing cleanup of *.kubevirt.io node labels
  • fix to allow updated/overwritten registration yaml from the controller. For cases where initial EdgeNodeClusterConfig/Status was not sufficient to allow registration and then later config from the controller sends an updated registration.

andrewd-zededa avatar Jun 18 '25 23:06 andrewd-zededa

Moved back to draft as the plan may be changing for this PR

andrewd-zededa avatar Jun 19 '25 15:06 andrewd-zededa

Mostly yetus cleanup and comments in latest push, except for a change to use Registration_Applied to condition Update_CheckClusterComponents

andrewd-zededa avatar Jun 26 '25 16:06 andrewd-zededa

Yetus fixes and rebase off master.

andrewd-zededa avatar Jun 30 '25 13:06 andrewd-zededa

/rerun red

andrewd-zededa avatar Jun 30 '25 14:06 andrewd-zededa

How hard would it be to later flip this around so that eve-kubevirt only downloads and starts k3s and later when it gets something new in the API (its "personality") it will add the others? And building on that, an eve-k* image which doesn't even download and start k3s until it receives the personality in the API?

From where do you determine which pieces to start in this PR? (Note that this is not a comment on the PR itself and its readiness, but about future evolution.)

eriknordmark avatar Jun 30 '25 16:06 eriknordmark

@eriknordmark currently the default mode of eve-k is that it should be able to deploy VM app instances so all the extra components need to be available. If we'd prefer keeping a base-k3s-only personality in eve-k at first boot then I think we will need to enhance eve-api to send the personality down as config and ensure the controller is modified to send the amended config for the needs to deploy VM app instance.

I certainly see the benefits of having a smaller first-boot config even if it's only the faster time-to-use deployment. Would you like to see this reworked to enable that option? If so I might recommend some type of eve-api config where a new boolean flag is defined in top-level config eg. "support_vm_app_instances" which would need to be set at onboarding time so that pkg/kube can ensure the components get installed.

andrewd-zededa avatar Jun 30 '25 21:06 andrewd-zededa

@eriknordmark currently the default mode of eve-k is that it should be able to deploy VM app instances so all the extra components need to be available. If we'd prefer keeping a base-k3s-only personality in eve-k at first boot then I think we will need to enhance eve-api to send the personality down as config and ensure the controller is modified to send the amended config for the needs to deploy VM app instance.

I certainly see the benefits of having a smaller first-boot config even if it's only the faster time-to-use deployment. Would you like to see this reworked to enable that option? If so I might recommend some type of eve-api config where a new boolean flag is defined in top-level config eg. "support_vm_app_instances" which would need to be set at onboarding time so that pkg/kube can ensure the components get installed.

I think we can merge this as is and then figure out the API including how that would be set from the controllers (the UX needs to be thought through)

eriknordmark avatar Jul 01 '25 00:07 eriknordmark