cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

clusterctl inside cluster in pod cannot find management cluster

Open steve-fraser opened this issue 2 years ago • 27 comments

What steps did you take and what happened: [A clear and concise description on how to REPRODUCE the bug.]

  1. Deploy Pod in cluster
  2. Install vsphere provider
  3. Generate configuration clusterctl generate cluster $(TEST_CLUSTER_NAME)
    --infrastructure vsphere
    -n $(TEST_CLUSTER_NAME)
    --control-plane-machine-count 1
    --worker-machine-count 0 > /tmp/vsphere-test-cluster.yaml

Error: management cluster not available. Cannot auto-discover target namespace. Please specify a target namespace: invalid kubeconfig file; clusterctl requires a valid kubeconfig file to connect to the management cluster: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

What did you expect to happen:

It is supposed to find the local capi installation

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api version: v1.1.2
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version): v1.21.8
  • OS (e.g. from /etc/os-release):

runner@mvm-runner-2:~$ cat /etc/os-release NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.3 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

steve-fraser avatar Mar 10 '22 16:03 steve-fraser

just fyi @Jont828

/area clusterctl

sbueringer avatar Mar 10 '22 16:03 sbueringer

Just to clarify - are you running the clusterctl binary inside a container and pod in the Kubernetes cluster? Have you supplied it with a kubeconfig so it knows the address of the API server and has access to the certs?

killianmuldoon avatar Mar 10 '22 16:03 killianmuldoon

Just to clarify - are you running the clusterctl binary inside a container and pod in the Kubernetes cluster? Have you supplied it with a kubeconfig so it knows the address of the API server and has access to the certs?

Yes I am running the clusterctl binary inside the mgmt cluster. Specifically I am using this to run a github runner inside the management cluster. This may be more of a feature request but I thought I would to not need the kubeconfig specifically instead it would behave like the kubectl binary would. Kubectl binary will work without dropping the config into the pod by using the kube api service account and env vars.

steve-fraser avatar Mar 10 '22 18:03 steve-fraser

Agree. I think it would be nice if clusterctl just does in cluster discovery as controllers do too.

It's not really nice if folks have to generate a kubeconfig somehow even though a Pod has the ServiceAccount credentials injected.

sbueringer avatar Mar 10 '22 18:03 sbueringer

/milestone v1.2

fabriziopandini avatar Mar 10 '22 20:03 fabriziopandini

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 08 '22 20:06 k8s-triage-robot

/lifecycle frozen

fabriziopandini avatar Jun 09 '22 09:06 fabriziopandini

So applications like cluster autoscaler that run in the cluster initialize their client with InClusterConfig() which gets the kubeconfig of the current cluster or returns an ErrNotInCluster. We could modify the entry code for clusterctl to detect if it's in a cluster, and if it is, go ahead and use the in cluster config. Wdyt @fabriziopandini @sbueringer?

Jont828 avatar Jun 14 '22 23:06 Jont828

I think something like the following should be fine:

  • if --kubeconfig is set use that one
  • if in cluster use in cluster config

Not sure at which point we should check for the default kubeconfig, but that might be already handled by the client-go util funcs which are usually used for this.

sbueringer avatar Jun 15 '22 04:06 sbueringer

So for the in-cluster config there are two approaches we could take. We could take the approach you outlined where we check for it, and if we get an ErrNotInCluster we suppress it and move on to the default kubeconfig discovery rules. Alternatively, we could add a flag to pass in the in-cluster config and if it's set, we skip the default kubeconfig discovery rules. I think the benefit of the latter approach is that developers trying to initialize the client can handle the ErrNotInCluster cases themselves instead of having it done in the background. Wdyt?

Jont828 avatar Jun 16 '22 00:06 Jont828

I would really prefer if it's just auto-discovery and simply works out of the box without anyone having to specify a special flag for it.

Let's take a look at how kubectl does it. Afaik it automatically works in a Pod / on a local env

sbueringer avatar Jun 17 '22 05:06 sbueringer

Sounds good. I'll take a look at kubectl's implementation when I get the chance and follow up here.

Jont828 avatar Jun 23 '22 22:06 Jont828

We're also in need of this issue. We want to use clusterctl backup in a CronJob in the management cluster. As @sbueringer mentioned I'd expect this to work like most other k8s clients using https://github.com/kubernetes/client-go/blob/master/rest/config.go#L512 that works out the box if it's running inside the cluster.

Jacobious52 avatar Jun 30 '22 04:06 Jacobious52

@sbueringer I'm happy to take a stab at this issue but I'll probably need some help since I'm not very familiar with this code.

I looked at Cluster Autoscaler and here they have some logic that uses the in cluster config. I believe their idea is to have an interface that has one implementation using a kubeconfig file and another implementation using info from the InClusterConfig().

The closest thing I can find is proxy.go where we have an interface that implements certain functions like GetConfig() and CurrentNamespace(). Do you know if we could simply make another implementation of the Proxy interface, or is there other code we would want to change as well?

Jont828 avatar Jul 06 '22 23:07 Jont828

As for kubectl, I tried running it on a pod but it seems like it doesn't work out of the box.

root@capi-test-control-plane:/# kubectl get pods -A
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" at the cluster scope

It seems like we need to set up permissions for it to work, and as a result I'm not too clear on how find the relevant code in their repo.

Jont828 avatar Jul 06 '22 23:07 Jont828

I did a bit more research and I think in general the behavior of controller-runtime matches relatively closely to what we want for clusterctl: https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/client/config/config.go#L43-L61 (unfortunately except the --kubeconfig flag because clusterctl has its own)

I think the clusterctl library (cluster.New) should ideally take a *rest.Config as input parameter instead of the path to a kubeconfig file. This way it can be used in various scenarios and it doesn't depend on a literal file on a disk.

But I have no idea if a change like this is acceptable and how much refactoring this would require.

sbueringer avatar Jul 07 '22 06:07 sbueringer

Kubernetes has a kubernetes.NewForConfig(rest.Config) function that does this - we could copy that and add a new function to cover over the case where we want to create a clusterctl client from the rest.config i.e. cluster.NewForConfig(rest.Config)

killianmuldoon avatar Jul 07 '22 10:07 killianmuldoon

Maybe we can keep the external API the same, by:

  • keeping cluster.New as is
  • adding cluster.NewForConfig which takes rest.Config

And then refactoring internally behind the API that we don't have to write a temporary kubeconfig with credentials somewhere?

sbueringer avatar Jul 07 '22 11:07 sbueringer

I'll take a look at this and see what's possible (looking at the code it's not as trivial as I thought :laughing:

/assign

killianmuldoon avatar Jul 07 '22 11:07 killianmuldoon

@killianmuldoon Sounds good! I started hacking on some ideas on my end. In proxy.go it seems like if we refactor to initialize it with a *rest.Config we could rework the other functions. One thing I'm not sure about is if we have access to a kubecontext from the rest.Config. For some of the other Proxy interface functions we could try to do something like this (from cluster autoscaler):

// CurrentNamespace returns the namespace from the current context in the kubeconfig file.
func (k *inClusterProxy) CurrentNamespace() (string, error) {
	// This way assumes you've set the POD_NAMESPACE environment variable using the downward API.
	// This check has to be done first for backwards compatibility with the way InClusterConfig was originally set up
	if ns := os.Getenv("POD_NAMESPACE"); ns != "" {
		return ns, nil
	}

	// Fall back to the namespace associated with the service account token, if available
	if data, err := ioutil.ReadFile("/var/run/secrets/kubernetes.io/serviceaccount/namespace"); err == nil {
		if ns := strings.TrimSpace(string(data)); len(ns) > 0 {
			return ns, nil
		}
	}

	return "default", nil
}

Jont828 avatar Jul 07 '22 22:07 Jont828

/triage accepted

fabriziopandini avatar Aug 05 '22 17:08 fabriziopandini

dropping from the milestone because not blocking, but nice to have as soon as someone has bandwidth /help

fabriziopandini avatar Nov 02 '22 14:11 fabriziopandini

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

dropping from the milestone because not blocking, but nice to have as soon as someone has bandwidth /help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 02 '22 14:11 k8s-ci-robot

Our organization is looking to create vclusters in our CI/CD pipeline, which runs jobs as Kubernetes pods, and clusterctl not being able to detect it's running in a pod like kubectl is somewhat blocking us from doing so (we can use vcluster directly)

robbie-demuth avatar Jan 26 '23 21:01 robbie-demuth

@robbie-demuth it would be great if someone from your organization could help in getting this fixed, I will be happy to help in getting this over the line

fabriziopandini avatar Jan 27 '23 09:01 fabriziopandini

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot avatar Jan 27 '24 10:01 k8s-triage-robot

Any updates on this ?

mjnovice avatar Mar 02 '24 03:03 mjnovice

/priority backlog

fabriziopandini avatar Apr 12 '24 14:04 fabriziopandini

The Cluster API project currently lacks enough contributors to adequately respond to all issues and PRs.

We keep this issue around since folks asked about it also recently, but if no-one shows up volunteering for the job most probably we will close it at the next iteration

/triage accepted /remove-lifecycle frozen

fabriziopandini avatar Apr 23 '24 13:04 fabriziopandini