cluster-api-provider-kubevirt
cluster-api-provider-kubevirt copied to clipboard
Provide an alternate qemu-guest-agent strategy to check VM's CAPI Sentinel file
Context
Following last Syncup Meeting (Tuesday April 18), we are now targeting multiple new features :
- short term : bypassing Sentinel file check with a "none" policy
- mid term : working on a alternate qemu-guest-agent usage to check file
- long term : making the whole SSH logic optional (for keygen & cloud-init management...) with something like a Sentinel File check mode selector in spec
This Issue tracks mid term item 2.
Feature Description
Refering to https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/issues/230 , we need to check bootstrap status using something else than SSH, and only rely on the Kubevirt apiserver for everything (scheduling , probing, Sentinel checking, ...)
https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/issues/233 needs to be implmented first before start working on this issue.
Related Issues
- https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/issues/230
- https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/issues/233
quick update on this issue
I've been working for a couple days on it, and I'm now quite confident to says that current implementation is creating major problems on the road.
I'll try to explain as clearly as I can what is blocking us & what should be changed to implement qemu-guest-agent
based checks in the VM's virt-launcher pod.
What is going on (during bootstrap check) ?
- Machine reconciliation is triggered in
kubevirt_machinecontroller.go
- Reconcilier fetches & carries a global context with multiple compiled informations (vm, vmi, kubevirtcluster, kubevirtmachines, ssh keys ... )
- Reconcilier builds the infra cluster kubernetes client
- Depending the kubevirt topology (local or external), machine controller reuses existing capk controller client (based on
controller-runtime
) or builds a new one from a kubeconfig (based oncontroller-runtime
too)
- Depending the kubevirt topology (local or external), machine controller reuses existing capk controller client (based on
- Bootstrap check is done in
machine.go
implementation that receive global context & manipulates this controller-runtime client to configure & trigger a SSH check - Bootstrap result is returned to reconciler
What is the problem ?
If we want to check bootstrap status using an exec command, we need the kubernetes client supporting such a feature.
But it seems that controller-runtime client is not able to run exec command on pods at all ( https://github.com/kubernetes-sigs/controller-runtime/issues/1229 ).
So the only (good) way to be able to trigger a pod exec from the code would be to rebuild a new kube client based on k8s.io/client-go
(which supports spawning pod exec commands)
Although it's pretty easy to say, such a change requires extracting controller-runtime
client's configuration in order to stay compatible with local / external setups. This configuration must therefore be extracted after building (or reusing) the infrastructure client (during step 3).
Also, extracting kube client configuration is not as simple as it looks. If we're reusing local controller client, it's not only "parsing a kubeconfig file" and we're might need to change multiple files in many packages that have a global impact on CAPK.
The main problem I see it that we're not able to return any information needed to spawn a new client (rest.Config{}) if we're dealing with an internal infra client. ( https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/blob/main/pkg/infracluster/infracluster.go#L42-L44 ) ... So we'll have to make specific code for both internal & external cases up to the end. This make the whole configuration & client management a bit more complex.
Why it has a big impact ?
So far, I've identified & tested several things to implement to support such a change :
- we need to make the infra cluster configuration available to the in the controller global context
- this configuration cannot be extracted from an existing
controller-runtime
client. So it has to be created during the client build stage. - the machine bootstrap checker should be aware of this configuration, and build a ephemeral kube client using
k8s.io/client-go
lib - then it should use it when qemu-guest-agent check is asked
k8s.io/client-go
lib allows building a new REST client from a rest.Config{}
struct. We can easily return it during client generation because we already working with it (https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/blob/main/pkg/infracluster/infracluster.go#L75-L78).
Building a REST client also requires codecs & gvk scheme that we may carry along the way too (https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/client/apiutil?utm_source=godoc#RESTClientForGVK). A more "standard" client can perfectly be created too using the native go-client client lib ( https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/client#New ), but managing schemes & gvk are still a topic with with method.
technical focus
In order to make this working we'd need several changes :
-
GenerateInfraClusterClient
func, in addition from returning the rightcontroller-runtime
client, should also returnsk8s.io/client-go/rest
rest.Config configuration. - this configuration should be carried along the rest of the reconiliation context
- All controllers should ignore this new configuration except bootstrap_controller that will pass it to externalMachine object (r.MachineFactory.NewMachine)
- the externalMachine implementation will avoid using an existing
controller-runtime
client but rather take arest.Config{}
as input and build a generic go client that is both able to check CRD specs & status but also run exec command depending the bootstrap check mode
Since the rest.Config{}
is sufficent to create a new client but is also transparent for both local and remote configurations, it seems to be a good pivot to make bootstrap check a bit more independant from main controller kube client.
Proposed action items
- Probably extract manager's main client config (in main.go) and load it in the reconciler clobal context
- This would allow us to work exclusively with rest.client ( https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/blob/main/main.go#L111 ) whatever if we're dealing with a local infra cluster or an external one
- Extract
rest.config{}
during controller client generation. Update all dependencies & tests - Make
rest.config{}
available in context - Adapt machine controller & external machine implementation to support rest.config{} for every kube request instead of existing
controller-runtime
client- this is probably better than maintaining 2 clients for the same infra cluster in the same package
I've already started a concept (https://github.com/BarthV/cluster-api-provider-kubevirt/tree/guestagentcheck), but the change is huge and I realized that it cannot fit in a single PR.
Let's discuss about this plan (or any other possible ones) on next meeting
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.