kube-openapi icon indicating copy to clipboard operation
kube-openapi copied to clipboard

Add k8s name related formats

Open jpbetz opened this issue 2 years ago • 28 comments

This adds the three most heavily used name formats in k8s:

  • k8s.io/dns-1035-label
  • k8s.io/dns-1035-label-prefix
  • k8s.io/dns-1123-label
  • k8s.io/dns-1123-label-prefix
  • k8s.io/dns-1123-subdomain
  • k8s.io/dns-1123-subdomain-prefix

Note: These formats implemented in k8s in ways that do not exactly match the RFCs (https://github.com/kubernetes/kubernetes/issues/71140, https://github.com/kubernetes/kubernetes/issues/79351). But, intend to use these formats to communicate the existing Kubernetes validation rules via OpenAPI, so we want to keep them as-is.

Benefits:

  • Easier for CRDs to use the same name format already used in k8s.
  • Useful for publication of OpenAPI from Declarative Validation

xref: https://json-schema.org/draft/2020-12/json-schema-validation#name-custom-format-attributes, https://spec.openapis.org/registry/format/

jpbetz avatar Apr 12 '23 16:04 jpbetz

are non-standard format strings supposed to be namespaced or not?

what are the implications for CRDs created using these format strings prior to this, allowing in data that ignores those format requirements, then auto-tightening validation when upgrading to a control plane with this support added?

liggitt avatar Apr 12 '23 16:04 liggitt

are non-standard format strings supposed to be namespaced or not?

I have not found any mention of namespacing in the specs. xref https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-01#name-custom-format-attributes

I am open to naming suggestions for these formats though. I had briefly considered something like "kubernetes-name" but couldn't think of anything that aided rather than harmed clarity on what the format actually is.

what are the implications for CRDs created using these format strings prior to this, allowing in data that ignores those format requirements, then auto-tightening validation when upgrading to a control plane with this support added?

Yes, exactly.

Today, supported formats is used strip out unsupported formats when used in older version of the kube-apiserver. We could extend this to have two lists: supported and storage. All new formats will be added to storage for at least one minor release of Kubernetes before the format is added to supported so that any usages of the new formats written to CRDs downgrade safely.

jpbetz avatar Apr 12 '23 20:04 jpbetz

/assign @apelisse

Would you be willing to review?

jpbetz avatar Apr 26 '23 16:04 jpbetz

Thanks, looks better to me :-)

apelisse avatar Apr 26 '23 17:04 apelisse

what are the implications for CRDs created using these format strings prior to this, allowing in data that ignores those format requirements, then auto-tightening validation when upgrading to a control plane with this support added?

Yes, exactly.

Today, supported formats is used strip out unsupported formats when used in older version of the kube-apiserver. We could extend this to have two lists: supported and storage. All new formats will be added to storage for at least one minor release of Kubernetes before the format is added to supported so that any usages of the new formats written to CRDs downgrade safely.

I'm not sure I understand how that makes existing persisted CR data safe. However long we take to roll out support for the format, at some point, would we start treating existing persisted data that didn't match the specified format as invalid?

liggitt avatar Apr 27 '23 19:04 liggitt

I'm not sure I understand how that makes existing persisted CR data safe. However long we take to roll out support for the format, at some point, would we start treating existing persisted data that didn't match the specified format as invalid?

We could solve this together with the more general problem of CRD validation ratcheting. I.e. the case where a CRD author adds a format to a CRD, and there already persisted CR data which then immediately becomes invalid. The case we are discussing is similar except that instead of the CRD author adding a format, the CRD author already had a non-validating format in the CRD and then Kubernetes changed the format to be validating.

Ratcheting rule would be: if any CR data fails validation but is unchanged from the persisted value, we ignore the validation failure and allow the update.

EDIT: This is also what we need for rollback. Note that because any format is allowed on any CRD, this would allow for rollback to arbitrary old versions.

jpbetz avatar May 04 '23 21:05 jpbetz

We could solve this together with the more general problem of CRD validation ratcheting.

I agree a solution to that would resolve the issue with this. Shouldn't we get that solution in place before adding this? This would be the first example I can think of where we made a change that would reinterpret existing persisted CRDs in ways that would invalidate existing data

liggitt avatar May 09 '23 18:05 liggitt

We could solve this together with the more general problem of CRD validation ratcheting.

I agree a solution to that would resolve the issue with this. Shouldn't we get that solution in place before adding this? This would be the first example I can think of where we made a change that would reinterpret existing persisted CRDs in ways that would invalidate existing data

SGTM. I'll poke around and see who might be able to implement.

jpbetz avatar May 09 '23 19:05 jpbetz

/close When we circle back on this we'll either reopen or create a new PR.

jpbetz avatar Jul 21 '23 01:07 jpbetz

@jpbetz: Closed this PR.

In response to this:

/close When we circle back on this we'll either reopen or create a new PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jul 21 '23 01:07 k8s-ci-robot

We could solve this together with the more general problem of CRD validation ratcheting. I agree a solution to that would resolve the issue with this. Shouldn't we get that solution in place before adding this?

Just updating progress on this front was made with CRD Ratcheting Alpha added in 1.28: https://kep.k8s.io/4008

Will need to wait for it to mature before using it for name formats, but in the interim we can use inline OpenAPI schemas to describe the regex/cel expressions used for most formats.

alexzielenski avatar Aug 18 '23 23:08 alexzielenski

/reopen

CRD Ratcheting was promoted to beta in 1.30, freeing us up to resume progress on this.

jpbetz avatar Jun 28 '24 17:06 jpbetz

@jpbetz: Reopened this PR.

In response to this:

/reopen

CRD Ratcheting was promoted to beta in 1.30, freeing us up to resume progress on this.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jun 28 '24 17:06 k8s-ci-robot

Now that we have CRD ratcheting, it would be good to give guidance around when expanded openapi formats are the preferred way to model a validation, and when a CEL library is.

xref https://github.com/kubernetes/kubernetes/issues/124490#issuecomment-2741495825

liggitt avatar Mar 20 '25 19:03 liggitt

Now that we have CRD ratcheting, it would be good to give guidance around when expanded openapi formats are the preferred way to model a validation, and when a CEL library is.

xref kubernetes/kubernetes#124490 (comment)

We are missing well documented guidance. Let's get that going. I like the idea of CEL "supplementing" OpenAPI formats only were clearly needed.

Related- We've fallen behind on supporting formats. https://github.com/kubernetes/kubernetes/issues/130639 describes where we haveCEL support for formats that SHOULD have an OpenAPI format.

To help catch up, we've added emulation version support to formats: https://github.com/kubernetes/kubernetes/pull/130783. Combined with ratcheting, this should give us all the machinery we need to add the formats we agree should be added.

jpbetz avatar Mar 21 '25 01:03 jpbetz

cc @thockin for the Declarative Validation aspect. We anticipate needing OpenAPI formats for the built-in formats that we validate with hand written code so that we can publish the declarative validation rules through OpenAPI.

jpbetz avatar Mar 21 '25 01:03 jpbetz

Is there an extension-mechanism for format names? Like x-something or example.com/something?

The ultimate goal, I think is unification. CRDs can be written saying "this field is validated by format=x-k8s-dns-label" and we will do that server-side; Builtin types can express the same thing in a different way; apiserver publishes this in a way that an "enlightened" client can handle if they wanted to to do shift-left validation (e.g. as a git pre-commit).

thockin avatar Mar 21 '25 23:03 thockin

Is there an extension-mechanism for format names? Like x-something or example.com/something?

Unfortunately no. The RFC drafts that describe "format" don't provide an extension mechanism. https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-7.2 proposes a core set of formats but the only wording for extensions is:

Implementations MAY add custom format attributes.  Save for agreement
   between parties, schema authors SHALL NOT expect a peer
   implementation to support this keyword and/or custom format
   attributes.

jpbetz avatar Mar 21 '25 23:03 jpbetz

Should we make one up?

On Fri, Mar 21, 2025 at 4:54 PM Joe Betz @.***> wrote:

Is there an extension-mechanism for format names? Like x-something or example.com/something?

Unfortunately no. The RFC drafts that describe "format" don't provide an extension mechanism. https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-7.2 proposes a core set of formats but the only wording for extensions is:

Implementations MAY add custom format attributes. Save for agreement between parties, schema authors SHALL NOT expect a peer implementation to support this keyword and/or custom format attributes.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/kube-openapi/pull/384#issuecomment-2744695658, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVAGQL2FXR3MJJI2GAL2VSREDAVCNFSM6AAAAABZOKHFWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBUGY4TKNRVHA . You are receiving this because you were mentioned.Message ID: @.***> [image: jpbetz]jpbetz left a comment (kubernetes/kube-openapi#384) https://github.com/kubernetes/kube-openapi/pull/384#issuecomment-2744695658

Is there an extension-mechanism for format names? Like x-something or example.com/something?

Unfortunately no. The RFC drafts that describe "format" don't provide an extension mechanism. https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-7.2 proposes a core set of formats but the only wording for extensions is:

Implementations MAY add custom format attributes. Save for agreement between parties, schema authors SHALL NOT expect a peer implementation to support this keyword and/or custom format attributes.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/kube-openapi/pull/384#issuecomment-2744695658, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVAGQL2FXR3MJJI2GAL2VSREDAVCNFSM6AAAAABZOKHFWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBUGY4TKNRVHA . You are receiving this because you were mentioned.Message ID: @.***>

thockin avatar Mar 24 '25 21:03 thockin

Should we make one up?

Yes, that's probably best.

I poked around the set of formats listed in the draft RFC is being kept to a very small list and doesn't look to be accepting almost any new formats (semver was proposed and rejected).

The naming convention of formats is not specified, but the core set use lower-case with heavy uses of acronyms (uuid, uri, ...)

I'd prefer something not terribly long like x-k8s- or just k8s-. But I can see an argument for x-kubernetes- because we use that for JSON Schema customizations already. The context is different here. We're defining string values not field names, but still..

jpbetz avatar Mar 25 '25 03:03 jpbetz

Another option might be label-style? "k8s.io/dns-label"? As long as we pick something unique, we can always adapt later. The biggest concern I have is violating some documented or undocumented assumptions, like "must be alphanumeric".

I defer to your wisdom here

thockin avatar Mar 25 '25 14:03 thockin

This PR has been refreshed and is ready for review.

I've gone with "k8s.io/" as the prefix after consulting a few LLMs about widely used conventions . The main feedback was that standard bodies are steering away from "x-" prefixes, and that the charset in this prefix is within norms.

jpbetz avatar Mar 25 '25 15:03 jpbetz

A couple thoughts on names. Over time we have used 3 different DNS formats. But what we use is not really DNS compatible, it's DNS-ish.

  1. Should we continue with the "DNS" nomenclature? This looks like a significant move towards publishing this terminology so this may be our last chance to name the formats better. Or it might be too late already.
  2. Should we nominate one of the 3 as the "normal" format (e.g. k8s.io/dns-label) and let the others be wierd?

Since DNS 1123 subdomain is the type we use most often for resource name validation, how about a format name that emphasizes that? "k8s.io/basic-name" or something?

jpbetz avatar Mar 25 '25 16:03 jpbetz

How about:

  • k8s.io/qualified-identifier (dns 1123 subdomain)
  • k8s.io/identifier (dns 1123 label)
  • k8s.io/strict-identifier (dns 1035 label)

jpbetz avatar Mar 25 '25 16:03 jpbetz

I am terrible at naming things, but a few quick thoughts:

  1. DNS 1123 Subdomain and Label are most common, 1035 less so, and it looks like 952 has been weeded out. So 1123 is "obviously" the default.

  2. We already use the word "qualified" in "qualified name". I hate it there, too.

  3. We should avoid "simple" in names.

  4. AFAICT Service is the last and only resource which demands 1035. Maybe we can ratchet it to 1123? Can we do the same for other users of 1035?

Then the name could just be "dns-label" or "dns-style-label" or something? Would love alternatives to "qualified name" too :)

thockin avatar Mar 28 '25 23:03 thockin

  1. We already use the word "qualified" in "qualified name". I hate it there, too.

Maybe we just stick with subdomain then? We could consider "dns-full-name" or something. But if we're avoiding "qualified", is "full name" any better?

  1. We should avoid "simple" in names.

+1

  1. AFAICT Service is the last and only resource which demands 1035. Maybe we can ratchet it to 1123? Can we do the same for other users of 1035?

Dropping 1035 from the format list SGTM.

Then the name could just be "dns-label" or "dns-style-label" or something? Would love alternatives to "qualified name" too :)

Curious what value to the user using the term "DNS" adds? That aside, how about:

  • k8s.io/dns-subdomain (dns 1123 subdomain)
  • k8s.io/dns-subdomain-prefix
  • k8s.io/dns-label (dns 1123 label)
  • k8s.io/dns-label-prefix

?

jpbetz avatar Mar 29 '25 00:03 jpbetz

My current concerns:

  • Not sure what value the term "DNS" really adds to the users of these formats (they don't conform to DNS specs.. even though they ALMOST conform)
  • The user of the term "label" is pulling in a term from DNS feels like it could confuse anyone that is thinking in k8s terms and immediately starts thinking of k8s labels when they see it.

jpbetz avatar Mar 29 '25 00:03 jpbetz

I agree that invoking DNS is a bit confusing, especially if we invoke a particular detailed RFC which we then don't quite implement correctly (waves vaguely at all the issues complaining our 1123 validation... isn't)

I also agree that "label" is likely to confuse people in the context of Kubernetes. Perhaps "segment" would be clearer?


Taking a step back, I'm a little fuzzy on which of these goals we're trying to accomplish:

  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) for use by CRDs
  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) so we can publish openapi for in-tree types that can be translated into CRDs that will use identical validation
  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) so we can switch in-tree types to use generated validation
  • create formats that are based on in-tree validations but fix bugs / inconsistencies in them
  • evolve / change (either tightening or relaxing) validation of in-tree types

Example bugs:

Example inconsistencies / variants:

  • allowing or disallowing underscores
  • allowing or disallowing mixed-case
  • allowing or disallowing leading digits in segments
  • allowing all-numeric segments

liggitt avatar Mar 31 '25 15:03 liggitt

Jordan, I think this was aiming at:

  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) so we can publish openapi for in-tree types that can be translated into CRDs that will use identical validation
  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) so we can switch in-tree types to use generated validation

If we can find a better name to use in this that doesn't reference DNS and doesn't abuse the word "label" (which I never really recognized before but now cannot unsee), then that's great.

If we don't need to handle Service, perhaps we can use 1.34 to alpha-gate service name ratcheting to the 1123 -ish spec. 1.34 would tolerate services with 1123-style names, but not allow new ones. 1.35 would allow new usage. Unless we can lean on compat-version to save the alpha-release? @danwinship @aojea

Then we only have to name one thing instead of two.

"kubernetes short name" vs "... long name" ?

thockin avatar Apr 01 '25 16:04 thockin

I've updated this to use k8s.io/short-name and k8s.io/long-name

jpbetz avatar Apr 08 '25 14:04 jpbetz