cluster-api-provider-azure Add some sanity check for controller credentials

/kind feature

Describe the solution you'd like Right now, if one deploys CAPZ using Tilt and forgets to specify some credentials via environment variables, CAPz controller runs as usual until you try deploying a cluster, which then fails.

It would be nice to do any kind of check for credentials and either reject cluster operations or crash if the credentials are wrong. This would provide better experience for people deploying and developing CAPz.

Environment:

cluster-api-provider-azure version: 02ca6c383c4c459efb79c8d3673f332f212f4046

Mar 15 '21 18:03 invidian

The CAPZ controller does not require credentials. In fact, the controller credentials will be deprecated in a future version.

Each cluster should have it's own identity reference. Please take a look at https://capz.sigs.k8s.io/topics/multitenancy.html. wdyt?

Mar 16 '21 14:03 devigned

Each cluster should have it's own identity reference. Please take a look at https://capz.sigs.k8s.io/topics/multitenancy.html. wdyt?

This makes it IMO even more important to do sanity checks, if it's possibly up to the user to provide the credentials. Is there anything like this in place?

Mar 16 '21 18:03 invidian

What would you think about validating the cluster by checking the ENV for creds and if missing, require the cluster to have an identity ref?

@nader-ziada wdyt?

Mar 16 '21 18:03 devigned

the identityRef is part of the workload cluster definition, so that check would happen at the creation time of that cluster which would not help with the problem described here in this issue.

The problem with checking the credentials at the time of creating the management cluster is that it would require an extra call to azure to validate its values, which would be an extra step that some would argue is not needed, otherwise just checking for the existence of some random values would not necessarily be valuable

Mar 16 '21 19:03 nader-ziada

the identityRef is part of the workload cluster definition, so that check would happen at the creation time of that cluster which would not help with the problem described here in this issue.

I was just saying we check for a valid set of env vars, like AZURE_TENANT_ID, AZURE_CLIENT_ID and AZURE_CLIENT_SECRET are not "" when a workload cluster is being created. If there isn't a valid set of env vars and no identityRef, reject the cluster.

I agree we would have a tough time figuring out if creds are valid, but we have an easy time figuring out if they are present. If we know they are not present, then we know that a workload cluster will not be able to be deployed if there is no identityRef.

I think that will be a common error case until identityRef is the only way.

Mar 16 '21 21:03 devigned

I think that will be a common error case until identityRef is the only way.

yes, makes sense

Mar 16 '21 21:03 nader-ziada

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Jun 14 '21 22:06 fejta-bot

/remove-lifecycle stale

Jun 16 '21 10:06 invidian

wouldn't be this function validating the credentials as requested?

Jul 05 '21 15:07 fiunchinho

wouldn't be this function validating the credentials as requested?

That's a decent short term solution, though I think something more nested in the controller itself would be nice as well.

Jul 05 '21 15:07 invidian

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 03 '21 15:10 k8s-triage-robot

/remove-lifecycle stale

Oct 04 '21 07:10 invidian

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 02 '22 07:01 k8s-triage-robot

/remove-lifecycle stale

Jan 02 '22 08:01 invidian

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 02 '22 08:04 k8s-triage-robot

/remove-lifecycle stale

@invidian we already validate for the existence of AZURE_CLIENT_ID and AZURE_CLIENT_SECRET env vars. Additionally, Capz expects AZURE_CLIENT_SECRET to be set as a kubernetes secret as well. Do you think that checking that will improve the experience?

Apr 08 '22 20:04 shysank

@shysank is it validated by the controller itself or only via Tilt/clusterctl?

Apr 10 '22 14:04 invidian

Validation is (will be) done on the client (tilt), Controller will return an error if it can't find the secret.

Apr 12 '22 16:04 shysank

Hmm, based on my recent development work, I still think the situation could be improved. Right now if one passes wrong credentials, for example with no access to subscription at all and tries to create a cluster, the cluster cannot even be removed, which is quite annoying as one have to remove the finalizers by hand to get rid of the objects.

May 03 '22 07:05 invidian

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 01 '22 07:08 k8s-triage-robot

/remove-lifecycle stale

Aug 01 '22 08:08 jackfrancis

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 30 '22 08:10 k8s-triage-robot

/remove-lifecycle stale

Oct 30 '22 16:10 invidian

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 28 '23 16:01 k8s-triage-robot

/remove-lifecycle stale

Jan 29 '23 11:01 invidian

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 29 '23 12:04 k8s-triage-robot

/remove-lifecycle stale

Apr 30 '23 15:04 invidian

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 19 '24 10:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 18 '24 10:02 k8s-triage-robot

/remove-lifecycle rotten

Feb 20 '24 19:02 nawazkh