kpt Special behavior for generators

I've concluded that generators need special treatment.

Preprocessors (#2420) are ok, but are a workaround. #2466 is one issue that has been reported, but there are more issues.

Let's look at a generator like https://github.com/GoogleContainerTools/kpt-functions-sdk/blob/master/ts/demo-functions/src/expand_team_cr.ts.

The expansion function is associated with the pseudo-resource-type, not really with a particular package. That type may be used in many packages. It would be helpful to have a registry for that type-to-function mapping (versioning TBD).
I want to use the functions in the Kptfile pipeline similarly to Kubernetes mutating and validating admission control. They should re-run on generated resources. Other functions I'll just invoke imperatively using kpt fn eval.
I want to be able to edit generated resources and use merge logic similar to kpt pkg update. In this example, Namespaces frequently need to be labeled and annotated, so the function in its current form is useless. One needs to create a separate package just with the Team resource, re-rerun the generator, commit and push that package, then kpt pkg get that into the main package and update that each time the function is run. Because kpt doesn't support local packages AFAIK, there can't really be a review/approval workflow around updates to the package containing generator function's output.
If resources are no longer generated, they need to be pruned from the filesystem.

cc @justinsb

Oct 01 '21 01:10 bgrant0607

It would be helpful to have a registry for that type-to-function mapping

@mengqiy what is the easiest way to support this in function catalog.

I want to use the functions in the Kptfile pipeline similarly to Kubernetes mutating and validating admission control. They should re-run on generated resources.

Thinking of some possible options here:

As soon as new resources are detected after executing a function, re-start the pipeline with currently modified resources that includes new resources (note the resources are not written to the disk yet, they are written at the end of successful pipeline execution).
As soon as new resources are detected after executing a function, continue with the pipeline execution, and re-run the pipeline with only the generated resources. [Executing functions with only the new resources may not be desired because functions assumes that entire pkg (resourceList) to be available to them]
If new resources are generated at the end of pipeline run (note resources are written to the disk post successful pipeline run), then re-run the pipeline on the pkg (this behavior could be configurable at the top pkg level)

3rd option is the least disruptive and easy to reason with.

I want to be able to edit generated resources and use merge logic similar to kpt pkg update. In this example, Namespaces frequently need to be labeled and annotated, so the function in its current form is useless. One needs to create a separate package just with the Team resource, re-rerun the generator, commit and push that package, then kpt pkg get that into the main package and update that each time the function is run. Because kpt doesn't support local packages AFAIK, there can't really be a review/approval workflow around updates to the package containing generator function's output.

some early thoughts:

expand_team function can be written as ensure_team form that is idempotent and takes namespace and IAMRoles resources that are part of the input resourceList. So ensure form of the function is sort of acting only on the attributes of resources that are driven by pseudo resource. I see this behavior closer to how kubernetes controllers work. (multiple controllers acting in a coordinated manner and patching the bits they care about). Now obviously, writing ensure form of the function is more work over purely generative function. It's a way of saying functions are taking care of merge part here.
kpt supporting merge behavior for functions which is similar to server-side-apply functionality where expand_team and user-edits are treated as changes by different actors on same resources. To do this cleanly, will have to come up with some way of tracking/ownership of changes.

Oct 20 '21 22:10 droot

I realized, our SDKs can also make writing ensure (or idempotent) functions easier.

Oct 20 '21 22:10 droot

If new resources are generated at the end of pipeline run (note resources are written to the disk post successful pipeline run), then re-run the pipeline on the pkg (this behavior could be configurable at the top pkg level)

This is the direction I think would make the most sense, and would be the simplest for other functions to reason about.

I realized, our SDKs can also make writing ensure (or idempotent) functions easier.

Yes, mainly I think we need an ensure/generate SDK that helps with:

Merging changes between the function, other functions, and user manual edits
Generation and removal of resources. generate-folders is another function that needs to delete managed resources if they are removed from the master resource.

Oct 20 '21 23:10 morgante

Additional context #2435

Nov 04 '21 18:11 mikebz

kustomize has issues with the generator pattern also: https://github.com/kubernetes/enhancements/tree/master/keps/sig-cli/2299-kustomize-plugin-composition

Nov 17 '21 13:11 bgrant0607

Yes, mainly I think we need an ensure/generate SDK that helps with:

Merging changes between the function, other functions, and user manual edits

Ok, so I am adding this caveat after writing everything below, which is just a sort of thought process thinking through some of these issues, and I am still not clear on how we achieve this and don't have time to continue thinking about it now. If the musings below are not useful, then feel free to ignore them. Perhaps you already have some more ideas on a better way to do this.

I have a use case for a variant generator that would, for example, punch out a set of variants of a workload that are specialized to particular clusters. It works something like this:

Variant Generator Function Inputs

A selector for a set of resources (think: all resources in a given directory; could even be a package)
An input CRD with a array of structs (think: fields with setter values or inputs to bits of more complex logic)

Behavior

Select the resources according to the selector, resulting in a set of resources we will call A
For each entry in the array
- Duplicate A
- Apply transformations based on the struct
- Store result A_n in its own directory (i.e., modify the directory annotation)

I'd like the user to be able to subsequently modify A_n to produce A_n'. When the function is re-run I do not want to overwrite those edits. This can't be done with the approach above because it simply regenerates the entire A_n with each run, and so I cannot differentiate a delta between A_n and A_n' due to a user vs that due to a change in function inputs (or code).

My initial thought for solving that was to use patch files instead of an array of structs - one set of patches per variant. That is, have the user create a patch that when applied to A, produces A_n. The function then would simply process each of these patches, and apply those deltas to A_n' if it exists (or create it by applying the patch to A).

I think that would work, but it shifts the burden of variant generation to the user. Perhaps we can have two functions: one that generates the patch, and one that applies it to create A_n?

Patch Generator Function Inputs:

A selector for a set of resources
An input CRD with a array of structs

Behavior:

Select the resources according to the selector to get the set A
For each entry in the array
- Duplicate A
- Apply transformations based on the struct to get A_n
- Calculate a patch P_n to produce A_n

Variant Generator Function Inputs:

Hmm...some sort of selectors to identify P_n, and A_n, A_n' resources?

Behavior:

For each P_n
- If A_n' exists, apply the patch, otherwise create A_n' as equal to A_n

I think with this approach, re-running the function will not overwrite non-conflicting user edits of A_n'. Conflicting edits are still an issue though. Worse still, we have issues if we modify the generator function or the set of resources in A, because our patches will now potentially not contain some new resource or value from the modification, so that change will never get added to the generated resources. That is, if we add a resource to such that the selected resources, say A1 now have some additional resources than A, the next run will calculate a patch between A1 and A1_n, and the patch won't have the new resource, so it will never be applied to A_n'.

I think we have to take it a step further, and store the patches, along with some sequencing so that we can re-apply them properly ordered. Or maybe even that won't work, but instead we need to store the original A_n from the previous version of the function. I need to think about it more, but this seems to be getting out-of-hand.

Other approaches?

Dec 14 '21 19:12 johnbelamaric

@johnbelamaric Have you considered modeling this as a set of subpackages? (See go/package-sets.)

Dec 14 '21 19:12 morgante

Yes, now that you mention it I did at some point look at that and it looked promising. I'll have to go back and revisit it. Has anything with respect to that been implemented?

On Tue, Dec 14, 2021 at 11:33 AM Morgante Pell @.***> wrote:

@johnbelamaric https://github.com/johnbelamaric Have you considered modeling this as a set of subpackages? (See go/package-sets https://goto.google.com/package-sets.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GoogleContainerTools/kpt/issues/2528#issuecomment-993912577, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIHRM666MLI6FOARUU7OPTUQ6LXTANCNFSM5FDSSYCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Dec 14 '21 19:12 johnbelamaric

Not yet, the main blocker right now is the challenges with cyclical kpt fn render — when generators insert new resources (including a Kptfile with new functions) we need a mechanism to add functions on those resources.

Dec 14 '21 19:12 morgante

Ah, yes, I handle that in pre-v1.0 kpt with a two-pass pipeline, haven't looked at v1.0 kpt closely enough, hopefully there will be a similar solution available.

Dec 14 '21 20:12 johnbelamaric

Thinking about this a little more, it does seem that subpackage is the right model. The explicit version in the Kptfile gives us the ancestor for the 3-way merge, so we can make edits via the generator function, via other functions, and via manual user edits of the resulting generated instance.

One issue for generators - shouldn't they end up needing to work something like apply, with respect to pruning, for example? If we alter the generator function or its inputs such that you are getting fewer instances of a given subpackage, you'll need to prune the old packages. We could manage this with function-specific logic (for example, "this generator function owns this directory"), but it seems we should have some utility or at least conventions here.

Jan 04 '22 17:01 johnbelamaric

One issue for generators - shouldn't they end up needing to work something like apply, with respect to pruning, for example?

Yes, though they're actually also often similar to owner references. The approach we've taken thus far is to use an annotation to track the source resource and prune if it's changed.

We should definitely standardize this sort of functionality into the SDK though. It's a pain for each function author to write/maintain the logic and implementations might become inconsistent.

Jan 04 '22 21:01 morgante

Back to the original topic:

Ensure functions make sense to me. That's similar to https://github.com/metacontroller/metacontroller functions.

It could be a supported or at least recommended pattern, like variant constructors (#2590).

However, my eventual conclusion on expand_team_cr was that it wasn't sufficiently useful for declarative use. Instead, I switched to a minimalistic variant-constructor approach, using the ensure pattern for just the Namespace: https://github.com/GoogleContainerTools/kpt/issues/2184#issuecomment-949863947

In order for abstractions need to be worthwhile, they need to dramatically reduce configuration complexity or implement some significant business logic. Otherwise, we should think in terms of manipulating the underlying resource types, with resource-type-specific functions and tools, as imperative tools typically do.

I agree that variant generators could be modeled as generated subpackages, which could leverage the variant constructor pattern and specify variant resources declaratively rather than generating them from scratch. We should spawn another issue for that, if there isn't one already. The variant generation function could generate Kptfiles for the subpackages. We'd then need to trigger cloning of the subpackages, and reinvoke the pipeline to run the variant constructors.

I am generally not a fan of monolithic arrays or maps as ways to express desired variants. They seem appealing for specifying small numbers of variants with small numbers of varying attributes by hand, but are difficult to manage at non-trivial scale. Probably the set of variants would be derived from some input source using some automation. I'd represent them like other generators, using client-side custom resources, one per variant or per dimensional variation value (e.g., region and environment). Convention over configuration is our friend for these scenarios so that users don't need to specify where the varying attributes come from for each variant. Storing sets of varying attributes or "facts" is common in hierarchical parameter stores. We can use CRs (pseudo or actual) or ConfigMaps for this, which also should make custom UX for such use cases simpler to build.

Jan 20 '22 20:01 bgrant0607

For creating individual resources, imperative functions, CLI (e.g. kubectl create -o yaml), UI, etc. seems like the way to go.

Jan 28 '22 23:01 bgrant0607

An example: https://github.com/GoogleContainerTools/kpt-functions-catalog/tree/master/functions/go/enable-gcp-services

This sets a blueprints.cloud.google.com/ownerReference annotation.

Which prunes resources no longer needed. https://github.com/GoogleContainerTools/kpt-functions-catalog/tree/master/examples/enable-gcp-services-advanced#overview

Feb 04 '22 22:02 bgrant0607

Ah, the merge behavior was previously requested in #954

May 13 '22 19:05 bgrant0607

The question inevitably comes up regarding what to do when a generator function is updated.

It depends on whether changing the behavior of existing uses is desirable or not. It also depends on whether the generator would just support additional optional, parameterized attributes. Helm charts often conditionalize new attributes so that they don't affect existing uses of the chart automatically. Changes of default behavior are also typically undesirable for existing uses. So changing a generator is often not the best way to enforce new behaviors.

That said, we may want to support similar behaviors for upgrades to new generator versions similarly to new upstream package versions. One way to do that would be to update the function version in upstream packages first, then update their downstream packages. We'd want to surface available function upgrades similar to available package upgrades. This would be easier if we separated upstream and downstream functions (#2544) as well as generators and admission control (#3121).

Jun 13 '22 14:06 bgrant0607

That said, we may want to support similar behaviors for upgrades to new generator versions similarly to new upstream package versions.

One quick thought -- I can imagine generator function to represent a subset of resources of the package (sort of a slice of the package) sourced by running a specific version of function similar to pkg get <pkg-version. This is very similar to having a subpkg where upstream is a generator-function:version and it's input and changing the version is similar to upgrading to new version of that upstream.

Jun 13 '22 22:06 droot

The issue of pruning also applies to transformers that change resource identifiers, in the case that the identifier transformation is changed across revisions.

Jul 21 '22 01:07 bgrant0607

Drive-by observation: Infrastructure from Code projects that generate Infrastructure as Code formats are going to have similar challenges. Users will want to customize the generated IaC code/templates, but those changes will be stomped the next time the IaC is regenerated. The idea of generating app config from app code is not a terrible one. "oc new-app" is a simple version of that.

Mar 30 '23 15:03 bgrant0607

Something of interest here is what @henderiw and others are doing in Nephio. There, we are using functions to generate resources, as well as using the package conditions function of Porch. This means we need to operate a lot like a controller; we need to create, update, and prune child resources based upon one or more inputs. So, they are working on a controller-runtime like function SDK that handles various aspects of that for the function authors.

Wim - can you link in your slides / maybe the PR? This is very early work, but in a month or so it may be good to discuss in kpt office hours.

Apr 06 '23 17:04 johnbelamaric

Attached the link to the slides.

https://docs.google.com/presentation/d/1xKZM4Q_auoUMb6M4I_OTNlS7xu7K6cVm2F_FQg217lU/edit?usp=sharing

Apr 06 '23 17:04 henderiw

kpt kpt copied to clipboard

Special behavior for generators

kpt
kpt copied to clipboard