kpt Better control of relationship between packagerevisons and versions.

Porch are currently very flexible when it comes to how users can create packagerevisions. Some examples are:

Packagerevisions can be created and published in any order, i.e. it is possible to create and publish v2 before creating and publishing v1.
Each packagerevision is completely independent. So it is possible to create version v1 by cloning from package A and then create version v2 directly from scratch (init) or cloning from a completely different package.
We don't enforce increasing versions for the packagerevisions, which means it is possible to create v1 and then having the next version be v100.
We allow multiple draft packagerevisions to exist at the same time. This isn't necessarily a problem, but we do need to consider how we want to handle that during updates of a package and if we will allow more than one of them to ever be published.

This does create some challenges around determining the latest packagerevision for a package and automatically assigning version to a new packagerevision.

We should consider whether it would be helpful to restrict how packagerevisions can be created and versioned. An example could be:

Only the first packagerevision for a package can be created with init or clone. All other must be created by copy/edit of an existing packagerevision.
Always auto-assign the version of a packagerevision. So the initial packagerevision will have v1 and the new version for every new packagerevision created with copy/edit will be the latest used version plus one. This effectively forces all users of porch to adopt a versioning scheme with simply an increasing number rather than something like semver.
Only a single draft packagerevision can exist at the same time. So a packagerevision must either be approved or deleted before a new one can be created.

Aug 17 '22 16:08 mortent

I have a proposal to tackle some parts of this issue. There are some restrictions we can introduce to make it easier to reason about a package.

Package creation and editing

Background

This problem is outlined above:

Each packagerevision is completely independent. So it is possible to create version v1 by cloning from package A and then create version v2 directly from scratch (init) or cloning from a completely different package.

Proposal

I agree that we should make it so that the first package revision can only be created from init or clone. Subsequent package revisions should be created with edit or copy. This will force packages to have one distinct origin: either one upstream package in the case of clone, or the first package revision in the case of init.

Drafts and revision number assignment

Background

One thing that I've found odd when working with porch is that the revision number for a package revision needs to be assigned when you create the draft. This can lead to undesirable scenarios:

You can create draft v2 before you create draft v3. Then you can publish draft v3 before draft v2. The end result is that the order in which the package revisions are published is different from the semantic order of the revision numbers.
You have a published draft v1. You can create draft v2 before you create draft v3. You then realize you no longer need draft v2 and discard it. You then publish draft v3. Your published drafts are now v1 and v3 (v2 is skipped).

Comparison to git

In git, there is a concept of a named ref, which is usually one of two things:

an immutable, published tag that corresponds to a version
a branch, named descriptively, whose contents are mutable as features are added

The revision number in porch seems to be neither of these two things; instead, it is just an identifier for a particular package revision, but the version number itself doesn't necessarily correlate to the order of published revisions, nor does it give a description of what changes are included in the package revision.

Proposal

Instead of having the revision number assigned to a package revision when the draft is created, the revision number should be assigned upon approval. Draft package revisions' revision field should be empty until they are published.

This will ensure that the versioning correlates to the order in which the package revisions are published, and that version numbers don't need to be skipped when draft package revisions are discarded. If we choose to allow semver, this will also allow users to choose whether this package revision should be considered a major, minor, or patch revision upon publishing rather than upon creation of the draft, which will be beneficial in the case that the user needs to make unforeseen changes to a draft.

However, draft package revisions will still need some sort of identifier. We can introduce an additional field to the package revision spec, description, to allow a very short, up to 50-character description of the changes included in this package revision. For example, the description can be something as simple as new-starlark-fn. Upon publishing of the package, we should clear the description field.

Alternatives

We could require that there can only be one draft of a package at all times. This would resolve the issues with the ordering of published package revision numbers, but would be extremely restrictive if users want to make changes to a package in parallel.
Instead of having an additional description field to identify package revisions when they are drafts, we allow the user to put their short description into the revision field. The description would be overwritten by the published revision number upon publishing. However, it could become confusing to use the same field for two different purposes.
Instead of clearing the description field upon publishing of the package, we could consider either keeping it or folding it into the commit history somehow, so that the user can use it to keep track of what changes have been made between published package revisions.

Versioning restrictions

Background

This problem is outlined above:

Packagerevisions can be created and published in any order, i.e. it is possible to create and publish v2 before creating and publishing v1.

We don't enforce increasing versions for the packagerevisions, which means it is possible to create v1 and then having the next version be v100.

There are no restrictions whatsoever on what revision numbers can be. The edit and copy task has a notion of a default next revision number, which is the next semver number, but this is not enforced. I could create a package revision with a revision "number" of foo, and create the next package revision with a revision "number" of bar.

Proposal

We have two primary options for restrictions on package revision numbers:

Very strict versioning with a single number, e.g. only allow v1, v2, in increasing chronological order
Allow semantic versioning, where the next version number for a package revision can only be a semver major, minor, or patch bump. For example, if you have v1.0.0 as the initial package revision number, the next package revision number would be restricted to v1.0.1, v1.1.0, or v2.0.0.

From discussion in meeting, the first option is the simpler and more reliable approach that we should enforce for now. Porch can automatically assign the next revision number upon publishing of a package.

If a user needs to branch off a package into two different versions, they can maintain it as two different packages, rather than two different versions of the same package.

Alternatives

Another option we have here is to allow the user to define their own rules. By default, we could disallow semantic versioning. If the user requires more semver, they could define these rules in an additional KRM file in the package. We would need to define the syntax and GVK of this new KRM file containing these rules. One reason I am hesitant to introduce this early on is that this adds another configuration file for the user to understand and keep track of, increasing the complexity of the configuration they need to maintain.

Edited to add: In PR review, we decided to change description to workspaceName.

Sep 04 '22 23:09 natasha41575

I like this proposal:

Package creation and editing: Yes, this makes sense to me. I can't think of any reason why v2 of a package wouldn't be based on v1. I think this will also make the Package resource much more useful, as it will allow us to include the source (clone or init) in the Package resource and therefore allow direct creation of Package resources.
Drafts and revision number assignment: I like the idea of having a description field that can be used to provide information about drafts and then we just assign version numbers when a revision is published. I'm not sure I see any reason to clear the value when a packagerevision is published though, as it probably will still be useful information about how the new packagerevision is different from the previous one.
Versioning restrictions: I think enforcing strict versioning with a single number makes sense now, although we should be careful about making assumptions in the core logic that will prevent us from supporting other schemes in the future. I do kinda like the idea of allowing users to specify rules for versioning packagerevisions on the package resource itself. But probably not worth adding right now.

Sep 12 '22 17:09 mortent

Natasha's proposal and Morten's follow up comments make sense to me. Should we also capture additional metadata about the draft - like who created it? I guess that's a separate topic and can be done later.

The rule-based versioning is interesting and perhaps can go in the backlog. Strict sequential numbering works for now.

Sep 12 '22 18:09 johnbelamaric

Some questions may worth thinking of:

Which case best fit in the package update, init/clone or edit/copy?
For drafts and revision number assignment, I like the idea of creating at approval time. I'm a little bit doubt about using the spec.description as the draft ID. Mainly because that the term "description" is normally used for human-readable info and mostly is mutable, while identifier should be unique and immutable. Also, is this draft UI required to be a user-aware field? Can porch assign an auto-generated id?
Incremental versioning makes sense to me. Can you explain what " branch off a package into two different versions" means? After branching, how does the new package version look like?

Sep 12 '22 20:09 yuwenma

Thanks for the feedback @yuwenma!

Which case best fit in the package update, init/clone or edit/copy?

I think we eventually want to do a package update as a "reclone and replay", so walking through all the previous tasks and reapplying them. But @justinsb or @mortent can clarify the direction that we are heading here. Right now update is implemented as its own update task, which does a merge update, and it can only be done on a draft package revision. So it's neither init/clone nor edit/copy.

For drafts and revision number assignment, I like the idea of creating at approval time. I'm a little bit doubt about using the spec.description as the draft ID. Mainly because that the term "description" is normally used for human-readable info and mostly is mutable, while identifier should be unique and immutable. Also, is this draft UI required to be a user-aware field? Can porch assign an auto-generated id?

Yeah, I agree that the description shouldn't be considered the "draft ID" necessarily (I think I misworded this in my proposal). The revision metadata.name includes a porch-generated, unique, immutable hash identifying the package revision, and we should consider that the package revision ID. The "description" field is more for the user (a human) to identify their package revision without having to memorize the package revision hash.

Incremental versioning makes sense to me. Can you explain what " branch off a package into two different versions" means? After branching, how does the new package version look like?

For example, normally in semver we may maintain both v2 and v3 of some software. Instead of incrementally strictly going up, we may have patches to the old v2 and the new v3. So if we have both v2.0.0 and v3.0.0, we can later release both v2.0.1 (for users who need the patch but cannot upgrade to v3) and also release v3.0.1. But we don't want to allow semver in porch (yet).

This proposal states that if a user needs similar functionality, i.e. they need to make some breaking change to their package but still maintain the old version of the package, they should create two separate packages.

Sep 12 '22 20:09 natasha41575

Another minor point of discussion, when we create the first package revision, we could either start with v0 or v1. I'm leaning toward starting with v1, but am open to other opinions.

Sep 12 '22 21:09 natasha41575

In regards to drafts, any thoughts towards making draft revisions immutable, so anytime a user updates a draft, a new draft revision is created?

An advantage of this is that this will allow us to show a timeline of draft package updates and showing the changes between draft revisions. This can help show what the user changed after cloning a package (whereas today, the clone rending and user changes are mixed together). This does increase complexity and the number package revisions we need to track though...

Sep 12 '22 21:09 ChristopherFry

What is still unclear to me here is what is the mechanism to determine a new revision is needed? Is this all handled on the server side or is the client involved and what mechanism is envisioned here?

So here is a flow I am using right now.

I determine the latest revision for the package
I determine the KRM resources I need.
I check the diff between the latest revision and the new KRM resources. 4a: if no diff don't update the package revision -> done 4b. if a diff is determined a new package revision would be created.

All of this is executed on the client side but with this enhancement I hope more if not all is done on the server side.

Sep 13 '22 19:09 henderiw

In Porch (server side), revisions will be created like this:

When creating a package from scratch, it starts at revision 1.
When cloning a package, say A to a new repo, you get a new package A' with upstream pointed to A. A' starts at r1 regardless of the revision number of A
When a user makes changes to a package, it is done in a Draft branch when using Porch. Draft branches do NOT have revision numbers.
When you Approve a Draft, it moves to Publish and is assigned the next revision number at that time.

Sep 13 '22 20:09 johnbelamaric

@henderiw this is all server side and automatic

Sep 13 '22 20:09 johnbelamaric

@johnbelamaric On your bullet 3 -> When make changes to a package. To determine a change is required is still a client side job I believe or am I wrong?

Sep 14 '22 08:09 henderiw

I am not sure I understand the question completely. Yes, the client will decide what changes are needed to a package. But "client" here could be a controller - for example the controller we're working on in #3488. It could also be the GUI. Or via the CLI rpkg commands.

The basic Porch commands won't automatically make any changes to a Draft package after cloning. There are certain changes that may happen automatically on clone. For example, when you rpkg clone to a new name, the package name in the package-context.yaml is updated, IIRC. For some packages (in particular, the namespace packages used for provisioning namespaces), that package name is used as input to functions like set-namespace. This could generate a bunch of changes. But since that's on clone it's still going to be rev 1 of the new package.

Porch (server-side) will determine if a new revision number is needed. I guess some deltas between the Draft and current revision may be required for that to happen, but I am not sure. You may be able to force it with just a Clone / Propose / Approve cycle.

Sep 14 '22 15:09 johnbelamaric

John, a client can indeed be a controller. The scenario I had in mind is this. You have some KRM resources the client owns in the package. The controller creates them and is the owner of them. Now a change is needed based on a reconciliation trigger, which results in a crud operation on these KRM resources.

So I was wondering how this scenario would be working with this enhancement. Will the client do its changes to the KRM resources it owns and the server will create a new revision automagically or is the client having to do the diff handling to determine a new package revision is needed and the server takes care of the revision handing.

Sep 15 '22 07:09 henderiw

Yeah, I see your point. As part of reconciliation, the controller needs to decide if the resulting package has changed after re-applying any inputs or transformations or whatever the controller does. And you want to know how to do that. It seems like we should be able to make that part of the API - that we should be able to tell if there are an material changes (as opposed to artifacts of the simple act of creating a Draft). @natasha41575 WDYT?

Sep 15 '22 15:09 johnbelamaric

@henderiw @johnbelamaric I want to make sure I understand the proposal correctly. Is the suggestion that, if the porch sever sees a diff between the in-cluster KRM resources and the latest package revision, that the porch server will then automatically create a new package revision with the changes?

Will the client do its changes to the KRM resources it owns and the server will create a new revision automagically or is the client having to do the diff handling to determine a new package revision is needed and the server takes care of the revision handing.

I think we could make it part of the API to allow the client to easily check for a diff and create a new package revision accordingly. I'm curious if that would support the use case here. We would have to think about what it means for there to be a diff (Do we check all package revisions? Just the latest published one? All drafts?). But I think it is theoretically doable if we make some assumptions.

I'm less certain about porch creating a new package revision automatically without the client specifically requesting it. Some scenarios that come to mind if it is automatically created:

A user uses another tool to apply some sort of private data (like a Secret) to the cluster. IIUC, porch would see the diff and create a new package revision, meaning that it would end up writing that Secret out to git.
A user is manually playing around with the cluster just to see what happens. Porch will either create a new package revision for each individual small change or have to figure out when to update a draft in-place vs when to create a new draft. The user will then have to manually clean up any automatically created package revisions if they didn't intend them to be created or their changes to be published.

Edited to add: From discussion in meeting, parts of this comment may not be relevant because I think I didn't fully understand the use case when I wrote up this reply.

@mortent Do you have any thoughts on this?

Sep 15 '22 16:09 natasha41575

@ChristopherFry It seems like what you are asking for is the state of the packagerevision for every mutation. We do capture this as individual commits in git (and individual layers in oci), but we don't currently expose this information through the API. Creating new draft packagerevisions for every change seems like it would create a lot of drafts that will never be published and that we would need additional tooling to manage. I think we could look into whether there is a way to allow quering for specific states of the packagerevision when doing a GET on the package.

An idea is that we could let users specify the tasks that they want to see applied in the packagerevisionresources, and the API would return the state of the packagerevision as it looks after only those tasks have run. I would need to look more into it to determine if it is feasible. Alternatively, we could introduce yet another resource type that provides a view into the data we have in git to allow for fetching packagerevision resources at specific states.

I'm not sure I follow the last question here. Is the question how/who should decide when packagerevision should be proposed/published and therefore deployed into clusters? It seems like that would be specific to each use-case, so I'm not sure porch itself can make that determination. It seems like it would be up to the client. But it is obviously possible that the client here is a controller.

If the client wants to make this decision based on what would change in the cluster, it would be possible to leverage the kpt alpha live plan command, that does a dry-run of the package against the cluster and then produces a Plan krm resource that has a structured description of the changes that will be performed when running apply.

Sep 15 '22 16:09 mortent

If the client wants to make this decision based on what would change in the cluster, it would be possible to leverage the kpt alpha live plan command, that does a dry-run of the package against the cluster and then produces a Plan krm resource that has a structured description of the changes that will be performed when running apply.

Ooo, that's an interesting idea.

Sep 15 '22 20:09 johnbelamaric

Is the suggestion that, if the porch sever sees a diff between the in-cluster KRM resources and the latest package revision, that the porch server will then automatically create a new package revision with the changes?

I don't think that is what Wim was suggesting. But it actually is an interesting feature. We have the concept of configuration drift. When there is drift, you can either resolve the drift by reconciling back to the intended state, or you can resolve it by updating the intended state. From an declarative management perspective, we always do the former (change the live state to reflect the intended state). In fact as a best practice we avoid allowing mutations to resources that Config Sync manages by using a policy web hook.

I have seen requests though for supporting the opposite workflow at times. For example, an urgent real-time fix on a cluster could be backported to the package so that it is applied across all other instances of the package. In general there would need to be some sort of process / human level controls on that kind of backport. But being able to calculate the deltas and import them could be useful, particularly in package authoring phases, where experimentation can be done on a live cluster and then reincorporated into the package.

Sep 15 '22 21:09 johnbelamaric

I was trying to clarify, who triggers the creation of a new package revision. Is it the role of the client or the server? My take from all the feedbacks is that:

it would be better that the client is in charge. The client can be assisted by the server to perform this task.

Sep 19 '22 06:09 henderiw

cc @natasha41575 @mortent @droot @justinsb

When choosing to limit ourselves (for now anyway) to a single, sequential numerical version, we discussed the distinction between the version of the configuration package and the underlying software, and I believe I suggested we just use the name of the package for the software version. We didn't lay that out specifically, but what I had in mind was that we would have packages with maybe the major.minor version in the package name, and the patch version being just somewhere in the tags associated with specific PackageRevision versions. So, off-the-shelf vendors would generally align PackageRevision tags and patch releases, but, say, platform teams may have multiple PackageRevisions to represent their config preferences shifting over time for a given patch release

Vendor my-software-v1.0, v22, release v1.0.4 my-software-v1.0, v23, release v1.0.5

Org my-software-v1.0, v1, upstream v22, release v1.0.4 my-software-v1.0, v2, upstream v22, release v1.0.4 my-software-v1.0, v3, upstream v23, release v1.0.5 my-software-v1.0, v4, upstream v23, release v1.0.5 etc.

I think that can work.

However, where I still am uncertain is how we upgrade from v1.0 to v1.1, if that's a different package altoghether. What support do we have for rebasing onto a new upstream package? And @natasha41575, how does that affect our UpstreamPolicy work?

Oct 01 '22 14:10 johnbelamaric

@johnbelamaric don't you see a case where a configuration package can consists of multiple source packages? How do you see that working in the above scheme?

Oct 12 '22 05:10 henderiw

if we delete a configuration package I believe we have to delete all the package revision history and if we recreate we start from scratch meaning v1.0 or so. The meaning of the tags could change in this case as well. Not sure if this matters but can happen.

Oct 12 '22 06:10 henderiw

However, where I still am uncertain is how we upgrade from v1.0 to v1.1, if that's a different package altoghether. What support do we have for rebasing onto a new upstream package?

I don't think we have support for rebasing onto a new upstream package in the porch server. As @mortent describes in https://github.com/GoogleContainerTools/kpt/pull/3603#discussion_r993954378, we plan to enforce the invariant that all package revisions will have the same origin package.

I think it would be possible for UpstreamPolicy or some controller to do some sort of version of a rebase, where it creates a new package revision and does a reclone-and-replay using the new upstream package... but it would have to do so in a different, new downstream package because it would have a different origin. As a result, the client would end up creating a new package for every rebase, which seems like an awkward workflow.

@mortent @justinsb Have we ever considered adding support for rebase onto a different package?

Oct 13 '22 15:10 natasha41575

@johnbelamaric don't you see a case where a configuration package can consists of multiple source packages? How do you see that working in the above scheme?

Are you saying sub-packages / an umbrella package? There are two approaches we have discussed for that. One is "static" sub-packages - copying the packages into the umbrella package, simply as sub-directories. The other is the "app of apps" pattern, which is a "by-reference" pattern. Basically, one package contains pointers to other packages (probably via R*Sync objects).

We haven't really sorted out which to use in what situation or what the best practices around these would be. In the static sub-package case, with respect to versioning, the umbrella package revision number has to change whenever any change happens in any sub-package. In the by-reference case, you should still get the same behavior - that is, stability in the contents of the entire umbrella package - because sub-packages cannot be changed without generating a new revision number (and the reference includes the revision number).

Oct 13 '22 15:10 johnbelamaric

@mortent @justinsb Have we ever considered adding support for rebase onto a different package?

So I think we should distinguish between allowing rebase onto a different package, and whether the two resulting PackageRevisions would be part of the same package. With the way we capture the changes performed on a PackageRevision (for PackageRevisions created and updated through Porch), we do have the information needed to replay the changes. I think it should be possible to replay those on another package, but since we rely on patches it seems like it would be fragile. The rebase-and-replay functionality on different versions of the same package would have the same challenge though.

But that aside, there doesn't seem to be any reason why this couldn't be implemented. I'm thinking this is something an external tool/controller could add, rather than us supporting it directly in Porch.

As for the discussion yesterday around how we should handle deletion of packagerevisions to prevent a deletion from essentially resulting in a rollback, I think it depends on how we consider deployment repositories (non-deployment repos seems irrelevant for this discussion). If we consider the deployment repos as containing exactly the desired state of the clusters at any point in time (this is the Nephio approach), then deleting the whole package it should be removed from the cluster seems like the right thing to do. Similarly, if a packagerevision is deleted, then you would expect that a different revision will be deployed to the cluster. There are some challenges around this today since we don't have any way to delete a full package and all packagerevisions in one operation, so we should look at how we can handle that.

We have also ideas around a separate rollout controller that would be responsible for updating the desired version for specific packages. In this case, it seems like removing a package from a cluster should be handled by the rollout controller, so deletion from the deployment repo would not be strictly necessary. If a packagerevision in the deployment repo is deleted from underneath the rollout controller, that seems like it should be considered an error situation. Presumably the existing version in the cluster will remain there until the situation is resolved.

Oct 14 '22 16:10 mortent

This has been implemented. Any follow-ups should be handled in separate issues.

Jan 26 '23 04:01 mortent

kpt kpt copied to clipboard

Better control of relationship between packagerevisons and versions.

Package creation and editing

Background

Proposal

Drafts and revision number assignment

Background

Comparison to git

Proposal

Alternatives

Versioning restrictions

Background

Proposal

Alternatives

kpt
kpt copied to clipboard