Document how to approach abstraction

Open bgrant0607 opened this issue 3 years ago • 1 comments

Another reminder for myself, after discussing with @johnbelamaric.

Related to #3145, #3131, #2528

In kpt, packages are transparent bundles of resources that can be introspected and operated upon. They may CONTAIN resources (in-cluster or local) that represent abstractions, but they do not HAVE abstractions as a whole. Packages are not encapsulated. No parameter-based interfaces. WYSIWYG.

Some alternatives to abstraction are discussed in #3145. Thin abstractions get in the way, preventing standard resource-type-aware tools from interoperating on the configuration data. Just eliminating boilerplate and formatting YAML is not enough -- those could be done easily enough by GUIs, CLIs, and IDEs. Most parameterized attributes in off-the-shelf templates are just single attributes.

In order to carry its own weight, an abstraction needs to provide high leverage, such as a 10x or more expansion factor -- that is, the output is 10x larger than the inputs. Small percentages and even small factors rarely pay off. Some example platform interfaces have had expansion factors as high as 30x. This is a reasonable example, with 15x expansion: https://medium.com/pinterest-engineering/building-a-kubernetes-platform-at-pinterest-fb3d9571c948

High-leverage, durable abstractions that are resistant to erosion are often hard to design. They require specializing on particular use cases, requirements, assumptions, and so on. Simplicity and flexibility are opposing requirements.

Consequently, one tactic to mitigate abstraction erosion is to be willing to create multiple targeted abstractions rather than attempt to make a single abstraction flexible enough to handle multiple use cases. The Kubernetes workload APIs (ignoring the PodTemplate part - issues.k8s.io/170) are an example: ReplicaSet, Deployment, StatefulSet, DaemonSet, Job, CronJob.

Where there are cases of high fan-out or fan-in of information sources and targets, optionality can be reduced by decoupling producers and consumers. Service discovery or DNS are an example: IP addresses can be allocated by a variety of producers and discovered and consumed by a variety of clients, interacting through the standardized naming layer.

Interfaces can also be partitioned to separate concerns. The "claim" model in Kubernetes uses this approach. PersistentVolumeClaims represent a subset of PersistentVolume information, referring indirectly to a set of desired attributes using StorageClass. gateway-api.sigs.k8s.io, oam.dev and Crossplane compositions use similar models to partition and amortize complexity. Even Service and Deployment are an example of separation of concerns.

Automation can enable higher-level abstractions, by filling in details of lower-level abstractions through reactive, predictive, or ML-based techniques. Autoscaling is the canonical example.

This list is not exhaustive.

Each input to a high-leverage abstraction ideally should be put to many uses. The highest-leverage piece of information is the identity of the abstraction itself, and it should enable many assumptions to be made about the desired state being requested.

Jun 04 '22 02:06 bgrant0607

I neglected to say that abstractions should be defined using custom resources, either actually defined server-side using CRDs, or local/client-side. We'll need a registry for client-side CRDs.

When to use one or the other? Package orchestration blurs the lines regarding what types of client surfaces and automation are possible with client-side abstractions, but there are some situations where one approach or the other is more suitable.

Client-side / local is:

Easier to customize and extend, and amenable to more customization strategies
More transparent -- the output can be pre-validated, reviewed, and approved

Use server-side when you want to:

Encapsulate the generation logic and restrict further customization
Support multiple runtime client surfaces
Support real-time automation on top of the abstraction that uses runtime state information
Control external resources as well

And when change control, versioning, and rollback of generated resources are not needed.

Jun 04 '22 04:06 bgrant0607