kpt icon indicating copy to clipboard operation
kpt copied to clipboard

Support metadata fields on packagerevision resources

Open mortent opened this issue 1 year ago • 7 comments

The PackageRevision resource currently doesn't support several of the metadata fields commonly used to build controllers. This includes labels, annotations, owner references and finalizers. We need to add support for these fields on the packagerevision, packagerevisionresources and package resources.

mortent avatar Sep 10 '22 20:09 mortent

Where

We probably want to store the values of these fields with the rest of the information about the resources, i.e. we need to include this metadata about resources in git and oci.

Relationship with package lifecycle

These fields should also be mutable independently of the lifecycle of packages and revisions. So while the Spec for a published packagerevision can't be updated, it should still be possible to update labels, annotations, owner refs and finalizers.

git

Git is currently the most important and mostly used storage backend for packages and revisions. There are two high-level approaches to how we can store metadata in git:

  • Include it as part of the regular commits for the package. The obvious way to do this would be to include the information in the commit message, similar to how we currently encode information about the package and revision in commit messages to be able to associate a commit with a specific package without having to look at the content of each commit.
  • Store it separately from the regular commits in git. The most obvious solution that comes to mind here is to use git notes, but there might be possible to use a similar approach as git notes, but change it a bit to fit with our needs.

The drawback of including this information in regular commits, is that we will no longer have the 1:1 relationship between commit and changes to the content of a package. It will require that we add new commits every time the metadata changes. If we store the metadata separately in git, for example in git notes, we can make updates to the metadata separately from the regular commits that reflects the changes to the package content.

In both situations, it will not be obvious to users how to add this information to packages when interacting directly with git. But on the other hand, that would be mostly related to content, so manipulating the metadata about a package revision resource will probably mostly happen through porch or the kpt cli. We should make sure we have good porcelain for doing this.

Using git notes (or something similar)

Git notes is essentially just information associated with a commit that is stored on a special branch in git (stored at .git/refs/notes), which contains files with names referenced by the commit ID that they belong to. So notes can only be associated with commits, which isn't necessarily what we want, since a package revision can span multiple commits. But we could look up notes for every commit attached to a package and use that to reconstruct the metadata for the package revision.

So an alternative to using git notes could be to put metadata about package revisions in a special porch branch (something like refs/porch/packagerevisions) and address each entry with the identifier of a package/packagerevision (we need to handle possible deletion and recreation of packages/packagerevisions here, ref https://github.com/GoogleContainerTools/kpt/issues/3467). One possible challenge with this is that we can not simply use the sha for the regular commit as the resource version, since those will not change when we update a note (or something on a separate branch). But we may already have this challenge with the fields of the Status object that are added on-the-fly and not actually stored directly in git.

oci

It seems like this is an easier problem for oci, since every change to a package revision will lead to a new oci image being constructed. Thus, we should be able to always include the metadata in the metadata about the top layer in the image.

mortent avatar Sep 10 '22 21:09 mortent

One possible challenge with this is that we can not simply use the sha for the regular commit as the resource version, since those will not change when we update a note (or something on a separate branch). But we may already have this challenge with the fields of the Status object that are added on-the-fly and not actually stored directly in git.

"metadata.Generation" is important for controllers to know that the version of the resource they are operating on is the most recent. It's also necessary for managing the relationship between status and spec - the observed generation vs the current intended state generation. AFAIK, in ordinary K8s controllers, generation is bumped when metadata changes, but it is up to the specific resource. We need to think about whether metadata should contribute here to the generation or not.

@apelisse may have some good insight here.

See also:

  • https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#metadata
  • https://alenkacz.medium.com/kubernetes-operator-best-practices-implementing-observedgeneration-250728868792
  • https://stackoverflow.com/questions/47100389/what-is-the-difference-between-a-resourceversion-and-a-generation

johnbelamaric avatar Sep 12 '22 18:09 johnbelamaric

Oh - that said, I lean towards a special porch branch for metadata about a package. But we'll need a thorough evaluation.

johnbelamaric avatar Sep 12 '22 18:09 johnbelamaric

Yeah, I hadn't thought about metadata.Generation, but that is obviously also an important field we need to have a plan for. If I remember correctly, it is incremented when the spec changes, as well as when metadata.Labels and metadata.Annotations are changed.

Similar, when I started thinking about Conditions for packagerevisions, we also need a place to store status information for the packagerevision. I think we probably want do handle that in the same way as we do for metadata.

mortent avatar Sep 12 '22 18:09 mortent

Package conditions might end up different from actual resource status conditions. I was imagining them as actually set in either the Kptfile or as a their own separate resource. They may also be surfaced as status conditions, but I think we're going to need them stored in-package. Not 100% on that though.

johnbelamaric avatar Sep 12 '22 18:09 johnbelamaric

FYI, looks like the best practice is to drive generation based on spec only: https://github.com/kubernetes/kubernetes/issues/67428#issuecomment-456125985

johnbelamaric avatar Sep 12 '22 18:09 johnbelamaric

Why not store the metadata related to packagerevision in CRs ?

droot avatar Sep 14 '22 00:09 droot

With https://github.com/GoogleContainerTools/kpt/pull/3655, we have support for the basic metadata fields. Closing this issue.

mortent avatar Nov 15 '22 22:11 mortent