cluster-api-provider-azure Cluster Name regex validation prevents valid Kubernetes object names

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide?]

What steps did you take and what happened: [A clear and concise description of what the bug is.]

try to create a cluster with a name starting with a number, for example 1cluster-test
cluster creation fails with error

"msg"="Reconciler error" "error"="failed to reconcile cluster services: failed to reconcile public IP: cannot create public IP: network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code=\"InvalidDomainNameLabel\"  Message=\"The domain name label 1cluster-test is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$.

Kubernetes resources name follow DNS name restriction which allows for starting with a number
This is also inconsistent with other cluster-api providers
Cluster name validation regex is ^[a-z0-9][a-z0-9-]{0,42}[a-z0-9]$ (can't use: /"'[]:|<>+=;,.?*@&, Can't start with underscore. Can't end with period or hyphen. not using . in the name to avoid issues when the name is part of DNS name.) so the name passes the cluster name validation but fails on other resources

What did you expect to happen:

I expect to be able to create a cluster with a name starting with a number and for the capz controller to figure out how to not use that for the azure resources that require different conditions

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

cluster-api-provider-azure version: v0.5.2
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Sep 07 '21 16:09 nader-ziada

I expect to be able to create a cluster with a name starting with a number and for the capz controller to figure out how to not use that for the azure resources that require different conditions

would you expect the webhook to fail if any of the defaulted resource names are invalid (for example 1cluster-test is not a valid resource group name, and if not provided the rg name will default to the cluster name), or that the webhooks somehow know how to default the resource group and other resource names to a valid name (that seems dangerous, how do we ensure there are no collisions with other names if modifying the provided name? 1cluster-test could become cluster-test but then it would collide with cluster 2cluster-test which would also be in resource group cluster-test).

Sep 09 '21 15:09 CecileRobertMichon

my main concern is that this is not consistent with other provider, so wondering if there is a way to add a prefix or something in the internal resource names to avoid this situation.

Sep 09 '21 16:09 nader-ziada

Instead of adding a prefix in the background which can cause name collisions as Cecile has pointed out, can we have a flag that indicates that the user is willing to take that risk? The webhook will fail if the flag is set as false with an appropriate error message to nudge the user to set the flag to true. wdyt?

Sep 20 '21 19:09 shysank

Rather than a flag, I would prefer that if the user is defining a cluster name that is not a valid resource name (eg. RG), then we fail validation if they don't provide a valid RG name explicitly. I think allowing the user to use the webhook in a way that can cause collisions, whether through a flag or not, is too risky and could cause them to shoot themselves in the foot. They might be tempted to add the flag just to fix the error quickly without really understanding the meaning / impact and that could cause fatal errors down the line.

The question I have is, are there any Azure resources that we name based on k8s objects names that are not configurable right now?

Sep 20 '21 21:09 CecileRobertMichon

Totally agree that a flag is not ideal. This is more like a workaround to have a unified experience across providers. To provide more context, we have a situation in vmw where we have to restrict cluster names for other providers (eg. aws) even though they allow numerically prefixed names.

The question I have is, are there any Azure resources that we name based on k8s objects names that are not configurable right now?

Off the top of my head, I know for sure that availability set specs are not exposed to the user.

On a side note, are there any plans (if possible) to relax the naming constraints in azure?

Sep 20 '21 21:09 shysank

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 19 '21 21:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jan 18 '22 22:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Feb 17 '22 22:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 17 '22 22:02 k8s-ci-robot

/remove-lifecycle rotten /reopen

Apr 20 '22 17:04 CecileRobertMichon

@CecileRobertMichon: Reopened this issue.

In response to this:

/remove-lifecycle rotten /reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 20 '22 17:04 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 19 '22 18:07 k8s-triage-robot

/remove-lifecycle rotten

Jul 22 '22 14:07 jackfrancis

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Aug 21 '22 15:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Sep 20 '22 15:09 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 20 '22 15:09 k8s-ci-robot