kpt icon indicating copy to clipboard operation
kpt copied to clipboard

Support bulk package upgrades

Open mortent opened this issue 2 years ago • 7 comments

We want to let users do operations on multiple packages at the same time, i.e. bulk. The most obvious use-case for this is letting users update all downstream packages in one operation.

mortent avatar May 25 '22 15:05 mortent

I changed this issue to be just about bulk package upgrades.

I split bulk package creation into its own issue: #3347

And bulk function upgrades also: #3309

When we think of more useful bulk operations, we can create separate issues for them.

bgrant0607 avatar Jul 07 '22 14:07 bgrant0607

We showed bulk upgrade in the prototype demo: https://www.youtube.com/watch?v=d_iV22_6nAM

bgrant0607 avatar Jul 07 '22 16:07 bgrant0607

See also #3189

bgrant0607 avatar Jul 07 '22 16:07 bgrant0607

@mortent I have some questions about the requirements for this issue.

It seems to follow that if a user does a bulk upgrade, then each downstream package revision affected by the upgrade will get a new UpdateTask in their task list, is that correct? Is that something we expect porch to handle, or the client?

I see the WoW demo demonstrates updating all downstream packages in one operation. Will we also have to support the use case of updating just some of the downstream packages in one operation?

natasha41575 avatar Aug 10 '22 22:08 natasha41575

So I think we want to implement this as CRD and controller rather than as a regular API on the aggregated APIserver. So we need to define what the CRD for something like this should look like and keep in mind that we probably need to implement other bulk-like APIs in a similar way.

So I've thought about a few ways we can do this.

apiVersion: config.porch.kpt.dev/v1alpha1
kind: BulkPackageUpgrade
metadata:
  name: bulkUpgrade
  namespace: default
spec:
  targetPackages: # must be in the same namespace as the CR
# Option 1: List all packagerevisions (should this be packages instead?) that should be updated explicitly.
    list:
    - my-repo-9626794e984ff13c9a4c64df5af0f15ec3a146bf
    - my-repo-526fa27229adcc3b6a9a544c455c344a3b4d7597
# Option 2: LabelSelector to find the packages that should be updated.
    selector:
      matchLabels:
        package: my-packages
# Option 3: Specify the parent package and all direct downstream packages will be updated.
    parentPackage: my-package #
  targetVersion: v45
status:
  observedGeneration: 4
  conditions:
    ...
  packages:
  - name: my-downstream-package
    status: Updated
    revision: my-repo-526fa27229adcc3b6a9a544c455c344a3b4d7597

For the explicit list of packages, I think we can assume that it is up to the client to provide a set of packagerevisions that can actually be updated, but we'll provide feedback to the client through the status subresource. For the other two options, we need to consider which packagerevisions are "eligible" for updates. For example, if the the package my-package in the example above has a large number of old packagerevisions, it doesn't seem necessary to upgrade them all. Maybe just the latest packagerevision should be updated (a possible challenge here is that we only consider published packages for the "latest" tag, which means that updating them requires that we create a new packagerevision).

We need to think about how we use package vs packagerevision here. I'm not sure I've gotten it right in this proposal.

And yes, each update of a package will be like doing an update on the individual package, so they will all get the additional update task. The controller should handle that as part of updating each package by calling the current API. At some point we might want to give users the choice of doing an update (with the UpdateTask) or do the reclone-and-replay approach.

Currently we have an example of a controller as part of Porch with the RemoteRootSync. We probably need several controllers for bulk operations and for declarative cluster targeting, so we need to determine how we want to structure this.

mortent avatar Aug 11 '22 08:08 mortent

And yes, each update of a package will be like doing an update on the individual package, so they will all get the additional update task. The controller should handle that as part of updating each package by calling the current API. At some point we might want to give users the choice of doing an update (with the UpdateTask) or do the reclone-and-replay approach.

Today, if you do an UpdateTask, I assume that creates a new Draft revision?

Another question: TargetVersion is the upstream version, correct? Like:

  • targetPackages: identifies a bunch of packages that have a common upstream
  • targetVersion: identifies the version of the upstream to migrate to (could be upgrade OR downgrade, could skip versions)

After the upgrade, downstream versions may all be different, but they all shared the same common upstream version: downstream A may be on v56 and downstream B may be on v28, but both are based on v10 of the upstream.

Honestly, I don't know think this is the right approach. An "upgrade" is an imperative operation. I would rather expect we would have a controller that ensures the intended version of each package on each cluster / in each repo. Then, "bulk upgrade" would be just analogous to changing the value in that intended state for a bunch of packages at once.

I think we need a clearer user journey. What's the operation in the UI and/or CLI that the user is going to take?

johnbelamaric avatar Aug 11 '22 21:08 johnbelamaric

Updating a published packagerevision doesn't really make sense, so that will require creating a new packagerevision. I don't remember right now how this is handled in the update logic. But it does mean that as part of updating a package, we need to decide:

  • Should we always make a new packagerevision? If there is a draft revision with a version number that is after the latest published version, maybe we should just update that one instead of creating yet another packagerevision. But it does seem to have some corner cases we need to cover.
  • Is approval of the new package revision also part of the update process? If not, we need to also consider how users can do approval of packagerevisions in bulk.

Yeah, targetVersion is the upstream version all packages should be updated to. Not sure if we need to require that all target packages have the same upstream, they just all need to have targetVersion available in upstream. Agree that doing bulk updates of packages that share the same upstream seems to most common situation. You are correct that the resulting packagerevision might differ between the packages. The idea is that the resulting version for each package will be available in the status.

The primary CUJ for this is to allow bulk approval of packages from the UI. So allow users to select a set of package in the GUI and then select to update them all.

It is an interesting idea of doing this more like a controller. It seems mostly useful if the targetPackages are specified as a selector or by specifying the upstream package. It would make sure that any new packages that are created would automatically be picked up. It would require a somewhat different approach in the UI.

mortent avatar Aug 12 '22 13:08 mortent