Improve API versioning
Currently, we use the "enable-api-fields" feature flag to allow users to opt into features at different stability levels, as described in tep-0033.
We have discussed moving towards maintaining multiple versions of CRDs in the future (for example, a v1 version with stable features and a v2alpha1 version with stable and alpha features), but the main reason we don't do this is because it creates a large maintenance burden for contributors.
This issue can be used to brainstorm ways to make it easier to maintain multiple API versions of our CRDs. Some ideas that have already come up:
- move as much code as possible out of the apis package, such as validation functions, so that it doesn't need to be maintained in multiple places
- create a CRD version that's stored but not served, and all other CRD versions are converted to this one. this would be the version used in the reconciler.
- encouraging users to use the dynamic client instead of the tekton versioned client. however, we'd still need to support all versions of the CRD, so I don't think this is really easier from a contributor effort perspective.
- some kind of tooling to be able to easily view the diff between pipeline version packages?
@dibyom has written up a proposal for a new API versioning strategy: https://docs.google.com/document/d/15vD5vbK5UKCDFMoOqOwEc_NYACthtCZJlTJiiEEoljY
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/close
Send feedback to tektoncd/plumbing.
@tekton-robot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopenwith a justification. Mark the issue as fresh with/remove-lifecycle rottenwith a justification. If this issue should be exempted, mark the issue as frozen with/lifecycle frozenwith a justification./close
Send feedback to tektoncd/plumbing.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Looking at the scheduler-plugins repo, they've found a way to register "internal versions" of the types their controller operates on (example: https://github.com/kubernetes-sigs/scheduler-plugins/blob/7da686c01cf3bc5d8fe852334698468e046bb000/apis/config/register.go#L26), and then all the versioned types are automatically converted into the unversioned internal type.
@dibyom I know you originally suggested something like this and it seems like it's actually possible which is very cool! We'd have to change our storage version over to this "unversioned" type, but once we do this, I would guess we wouldn't have to do it again (e.g. if we want to create a v2 api). @tektoncd/core-maintainers thoughts on this?
FYI @JeromeJu
If this is possible with knative/pkg without too much trouble then I'd vote yes. It would tremendously simplify the conversions between versions (and also allow us to prototype much faster with new experimental apiVersions!)
Looks very cool indeed! thanks @lbernick 🆒
It would tremendously simplify the conversions between versions
could we elaborate how this is going to be simplify workload for conversion? or are we referring to the future work of implementing the. conversions? since this felt to me a shift of the storage version that's been converting to
another question in mind is how are CRDs/ features deprecated yet not-removed shall be handled ie. ClusterTask? would it be an implicit requiring this to happen after those being removed?
could we elaborate how this is going to be simplify workload for conversion? or are we referring to the future work of implementing the. conversions? since this felt to me a shift of the storage version that's been converting to
My guess is we will probably have the "internal" type be the same as the v1 types we've defined, and reuse the existing conversion functions we have (although we might just need to update their signatures).
another question in mind is how are CRDs/ features deprecated yet not-removed shall be handled ie. ClusterTask? would it be an implicit requiring this to happen after those being removed?
We might be able to do this independently for each CRD (not sure). For ClusterTask, we can probably just leave that in v1beta1 since we won't be adding new versions.
We might have to experiment a bit to see how this will work; I could be wrong about these points.
Super nice, but quick note on this. Because this would be the storage version, we would to have to be very careful about the change we do in there 👼🏼, and this might lead to a "relatively" big object (go struct I mean) because we are bringing with us a lot of "deprecated" fields that are there for "backward compatibilities". But I do agree, we need to experiment a bit with it 👼🏼
We might be able to do this independently for each CRD (not sure). For ClusterTask, we can probably just leave that in v1beta1 since we won't be adding new versions.
Yes, we can definitely do this on a CRD basis.
yeah we definitely need to experiment to see the viability of the is approach before deciding if we can actually do this.
I decided to look into how k8s handles these problems in a bit more detail.
They have a small number of served versions of the API, and the storage version is the latest stable version. They have an internal version of their API, which is not serialized, that their code operates on.
Defaulting is done when a versioned CRD is created or updated, but validation is done on the internal version of the API rather than the versioned versions. Conversion code is mostly autogenerated.
If they need to add a new unstable feature, they backport it to the stored version as well, aka the latest stable version, gating it behind the feature flag. This is so that objects can successfully round-trip between API versions. They have done round-tripping in the past via annotations, the way we do currently, but found it to be too error-prone (slack convo), so they decided to add the unstable fields to stable apis instead. They tombstone fields but never remove them. I asked why they don't make the internal version the storage version, and the answer is that this is strictly more work than what they do currently, and is just another version to handle (since the latest stable version contains all the fields anyway).
Here's what I think we should do:
- Swap over to v1 as our storage version, which we are doing anyway (https://github.com/tektoncd/pipeline/issues/5541), and have the reconciler code operate on v1 structs.
- Create an internal version of Task and swap our reconciler code over to use the internal version, leaving v1 as the stored version. (I'm suggesting swapping reconciler code to v1 first and then swapping to the internal version because the v1 swap is more urgent and will allow us to deprecate/stop maintaining v1beta1, while we can take more time with the internal version swap since it can have the same schema as v1.)
- If that goes well, we can swap reconciler code for the other CRDs
Optional follow-ups:
- Decide if we want to use the internal CRD versions as stored versions instead of v1. (I did some quick experimentation and verified that the storage version of a CRD doesn't need to be served.)
- If conversion via annotations becomes a problem, consider whether we want to add unstable fields to stable apis instead, the way k8s does. For now, conversion via annotations seems to be working mostly fine, unless annotations get too large.
- Look into autogenerating conversion code.
- Consider whether we want to do validation on the internal api instead of the versioned apis.
(I did some quick experimentation and verified that the storage version of a CRD doesn't need to be served.)
Awesome! Out of curiosity, what does the served field signify then?
@dibyom served means it's "visible" to clients / users, and thus kubectl. If it's not served, you, as a "client" cannot query it.
@dibyom served means it's "visible" to clients / users, and thus kubectl. If it's not served, you, as a "client" cannot query it.
But the aren't the clients/informers used in the reconciler also considered "clients" or users? How does the reconciler get to fetch/update/watch for changes but not other clients?
/assign
Hi @tektoncd/core-maintainers, I want to update our proposal of approaching this issue:
- Pull in conversion-gen,
- Copy v1 Task to
pkg/apis/pipelineand generate conversion code between internal version and v1beta1, v1. - Replace reconciler code to use internal task instead of v1 task
- Update v1 and v1beta1 validation functions to convert it to internal version and then call internal Task's validation
For more details and alternative discussions, please take a look at this doc We're looking forward to any suggestions and feedback. Thanks!
@Yongxuanzhang essentially, the proposal is to
- no new version in CRD (aka internal version is just "in-code")
- put all "logic" into the internal object (by logic I mean, validation, defaulting …)
- make reconciler work on internal object
- but initially fetches whatever is stored (today
v1) and directly converting to internal… - … work based of internal and …
- … convert back to v1 when we mutate the object (update status mainly)
- *we would just have to "wrap" reconcilerkind and some other function to go from v1-> internal or internal->v1
- but initially fetches whatever is stored (today
- use
conversion-gento generate conversion from versioned api to an internal object, so that we don't have to maintan that code ourselves
Am I right ?
@Yongxuanzhang essentially, the proposal is to
no new version in CRD (aka internal version is just "in-code")
put all "logic" into the internal object (by logic I mean, validation, defaulting …)
make reconciler work on internal object
- but initially fetches whatever is stored (today
v1) and directly converting to internal…- … work based of internal and …
- … convert back to v1 when we mutate the object (update status mainly)
- *we would just have to "wrap" reconcilerkind and some other function to go from v1-> internal or internal->v1
use
conversion-gento generate conversion from versioned api to an internal object, so that we don't have to maintan that code ourselvesAm I right ?
Yes. That's right, but I think we don't do defaulting for internal version.
Yes. That's right, but I think we don't do defaulting for internal version.
Alright, I think it could be nice (we need to validate it works well) 👍🏼.
Hi @tektoncd/catalog-maintainers, @tektoncd/catlin-collaborators. Just want to raise your attention and see if there are any concerns regarding the internal version proposal?
The proposal should not have any user facing changes, the impact for Pipeline developers would be:
- We need to add changes to both internal version and external version (e.g. v1)
- We need to implement conversions between internal version and external versions, in this proposal we use conversion-gen, but custom conversions may be needed in the future.
I will document all of the impacts into developer doc when implementing this feature.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
/lifecycle frozen