cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

Running e2e test fails in Tilt when multiple infra providers are deployed

Open pydctw opened this issue 3 years ago • 14 comments

What steps did you take and what happened: Deploy multiple infra providers to Tilt. In this case, I deployed both CAPA and CAPG.

$ kubectl get providers -A
NAMESPACE                           NAME                    AGE   TYPE                     PROVIDER      VERSION
capa-system                         infrastructure-aws      10m   InfrastructureProvider   aws           v1.2.99
capg-system                         infrastructure-gcp      39h   InfrastructureProvider   gcp           v1.2.99
capi-kubeadm-bootstrap-system       bootstrap-kubeadm       34d   BootstrapProvider        kubeadm       v1.2.99
capi-kubeadm-control-plane-system   control-plane-kubeadm   34d   ControlPlaneProvider     kubeadm       v1.2.99
capi-system                         cluster-api             13d   CoreProvider             cluster-api   v1.2.99

Run e2e test - I ran quickstart test for CAPA. The test fails with the following message.

INFO: clusterctl config cluster quick-start-ssevy1 --infrastructure (default) --kubernetes-version v1.23.3 --control-plane-machine-count 1 --worker-machine-count 1 --flavor topology
...
Failed to run clusterctl config cluster
    Unexpected error:
        <*errors.fundamental | 0xc0006a49d8>: {
            msg: "failed to identify the default infrastructure provider. Please specify an infrastructure provider",
            stack: [0x2b5cf66, 0x2b5c5b3, 0x2b66ef8, 0x2b6a308, 0x2bbd828, 0x13367da, 0x13361a5, 0x133589b, 0x133b629, 0x133b007, 0x135c745, 0x135c465, 0x135bca5, 0x135df52, 0x136a2a5, 0x136a0be, 0x2bf076f, 0x1112862, 0x106cf01],
        }
        failed to identify the default infrastructure provider. Please specify an infrastructure provider
    occurred

What did you expect to happen: Be able to run e2e tests using Tilt setup with multiple infra providers.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api version:
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

pydctw avatar Feb 28 '22 19:02 pydctw

First analysis is that clusterctl is unable to automatically detect the infra provider when multiple are deployed.

I have to take a closer look if there is a good way for the e2e test to set the infra provider during clusterctl config cluster.

sbueringer avatar Feb 28 '22 19:02 sbueringer

Some more context:

  • config cluster is called by ApplyClusterTemplateAndWait.
  • ApplyClusterTemplateAndWait has a parameter InfrastructureProvider
  • We currently set it to DefaultInfrastructureProvider (which is the empty string) in all tests: https://grep.app/search?q=InfrastructureProvider%3A&filter[repo][0]=kubernetes-sigs/cluster-api

Potential solutions:

  1. Detect infra provider based on e2e conf. I.e. if only one infra provider is configured in the e2e conf use that one instead of default.
    • There is already an E2EConfig.InfrastructureProviders() func which returns all infra providers of an e2e config
  2. Add an env var to our e2e tests to pin the infra provider, e.g. CLUSTERCTL_INFRA_PROVIDER. Use the env var if it is set instead of the default provider.

I would prefer the first option, but maybe there are other and better alternatives.

@fabriziopandini WDYT?

sbueringer avatar Mar 01 '22 13:03 sbueringer

/milestone v1.2

@fabriziopandini WDYT?

I think it is ok for our E2E to use as a default the first infrastructure provider from the docker file. The only downside is that E2E is not testing provider auto-detection, but as far as I remember this is covered by unit tests

fabriziopandini avatar Mar 01 '22 14:03 fabriziopandini

I think it's fine if we only set the infra provider it if there is one in the e2e config. This should cover 99% on the cases.

I think otherwise we would break cases where intentionally multiple infra providers are deployed (as we would always use the first one for config cluster). (although I think that this currently doesn't work with our e2e tests as the infra provider defaulting we currently use doesn't work with multiple infra providers either)

sbueringer avatar Mar 01 '22 14:03 sbueringer

I'm not really worried about breaking cases where intentionally multiple infra providers are deployed, because this is not working today; also, I want to make it possible to deploy multiple providers because this could be useful e.g. for starting using kubemark.

Using the first infra provider as a default provider ticks a couple of points:

  • it covers without breaking changes the use cases where only one provider is defined (all the E2E tests today).
  • it makes it possible to have multiple providers and it provides a simple defaulting rule to start playing around with these scenarios
  • it does not prevent to use of providers different from the default one

fabriziopandini avatar Mar 01 '22 14:03 fabriziopandini

Sounds good. Essentially we will support more cases than before (because currently the tests would just fail if multiple providers are deployed / defined in the e2e config).

Just to make sure we're talking about the same thing. We would change this code: https://github.com/kubernetes-sigs/cluster-api/blob/a82dfd5e079a6640c79c4baf175f63cb65213749/test/e2e/quick_start.go#L91 to:

InfrastructureProvider:   input.E2EConfig.InfrastructureProviders()[0],

across all e2e tests?

(we already validate that we have at least 1, so we're safe against panics: https://github.com/kubernetes-sigs/cluster-api/blob/5026786ee809c5def466049f5befd2a786fbcefa/test/framework/clusterctl/e2e_config.go#L491-L493)

sbueringer avatar Mar 01 '22 16:03 sbueringer

Basically yes + some doc for the providers to do the same Nit, let's implement how do we compute the default infra provider into a func, e.g. E2EConfig.DefaultInfrastructureProvider(), so if we have to change the logic in the future we have to touch only one place

fabriziopandini avatar Mar 01 '22 16:03 fabriziopandini

Sounds good to me.

@pydctw If you have some time, do you want to pick this up? :)

sbueringer avatar Mar 01 '22 17:03 sbueringer

Sure. I can work on it. /assign

pydctw avatar Mar 02 '22 00:03 pydctw

/milestone v1.2

fabriziopandini avatar Mar 07 '22 12:03 fabriziopandini

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 05 '22 12:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 05 '22 13:07 k8s-triage-robot

/remove-lifecycle rotten

cprivitere avatar Jul 14 '22 20:07 cprivitere

/triage accepted /help-wanted

fabriziopandini avatar Aug 05 '22 17:08 fabriziopandini

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 03 '22 18:11 k8s-triage-robot

/lifecycle frozen /unassign @pydctw /help

fabriziopandini avatar Nov 03 '22 20:11 fabriziopandini

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/lifecycle frozen /unassign @pydctw /help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 03 '22 20:11 k8s-ci-robot

(doing some cleanup on old issues without updates) /close unfortunately, no one is picking up the task. the thread will remain available for future reference

fabriziopandini avatar Mar 24 '23 18:03 fabriziopandini

@fabriziopandini: Closing this issue.

In response to this:

(doing some cleanup on old issues without updates) /close unfortunately, no one is picking up the task. the thread will remain available for future reference

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 24 '23 18:03 k8s-ci-robot