eventing [Experimental] Object reference resolver to support non-Addressable objects

Description

This proposal is about extending the KReference resolver to support non-addressable objects (no status.address field). For those objects, we propose to ~delegate the reference resolution to an external service which upon receiving a KReference object replies with a resolved URI or an "not supported" error~ allow people to define a mapping between "Kind" and "URL-Template", where the URL-Template can use fields from the Object being referenced.

Use Case

The driving use case is this one: for instance, as a user I want to be able to trigger a batch job every day by having a PingSource CR targeting a batch job definition:

IMPORTANT NOTE: this is just an example! The source can be anything (Github, Kafka, you name it) and JobDefinitition can also be anything that is not Addressable.

apiVersion: my.batch.job/v1alpha1
kind: JobDefinition
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4 
---
apiVersion: sources.knative.dev/v1
kind: PingSource
metadata:
  name: trigger-job
spec:
  schedule: "@daily"  
  sink:
    ref:
      apiVersion: my.batch.job/v1alpha1
      kind: JobDefinition
      name: pi

Exit Criteria Users are now able to reference non-addressable object

Experimental flag name: kreference-custom-resolvers

Experimental feature stages plan

Below the proposed plan for the feature stages (this list implicitly includes the requirements defined in the process)

Alpha:
- [x] Extend URIFromObjectReference to support custom resolvers in knative/pkg
- [x] Add kreference-mapping experimental flag
- [x] Add a new configmap (kreference-mapping.yaml) to configure mappings between "Kind" and "URL-Template"
- [x] Implement kind -> URL mapping URI resolver
- [x] PingSource uses the new custom resolver
- [ ] Track reference
- [ ] All controllers relying on sink resolver uses the new custom resolver
- [ ] User documentation
Beta graduation as soon as 1 release after the inception
Beta:
- [ ] User documentation stabilization and improvements
- [ ] More e2e tests
- [ ] Add conformance tests
Stable graduation as as soon as 2 releases after the Beta graduation
Stable:
- [ ] Add the requirement to the spec

Affected WG

Eventing WG
Networking WG

Jul 14 '21 14:07 lionelvillard

/cc @duglin @cr22rc @n3wscott

Jul 14 '21 15:07 lionelvillard

Questions, since I missed the presentation:

How does this work with ContainerSource / SinkBinding, which today takes a K_SINK URL?
Is there a mechanism to re-trigger the address resolution if the backend mapping of the resource -> URL changes?
The given example is a custom resource, which seems like it could implement status.address fairly easily. Are there compelling use cases where storing this information in a resource (either JobDefinition or EventTriggeredJob which has a ref to a JobDefinition and acts as an adapter) doesn't work?
Is there a Feature Track doc that explains all this, including how it's rolled out across in-tree and out-of-tree components, which might be in multiple languages (and may not be released on the Knative release cadence or using Knative libraries)?

Jul 14 '21 18:07 evankanderson

All, thanks for the chat on today's WG call. A couple of additional thoughts:

obviously this isn't just about the PingSource, it's just a simple example. Any Source could need to be connected to some non-duck-typed-Addressable resource as its Sink
this also doesn't just apply to Sources, Triggers would fall into the same problem space. Making this change in pkg would help make support for this possible with minimal changes. But, of course, any Source that's not leveraging pkg would need it's own custom support for this - but that's no different than them needing to special case K8s Services today.
while this issue talks about a "reference resolver service", it's also possible that we could look at other mechanisms to support this extensibility. For example, perhaps we have some kind of ConfigMap that allows people to define a mapping between "Kind" and "URL-Template", where the URL-Template can use fields from the Object being referenced. For example, batchJob.example.com/v1 -> https://jobrunner.system-ns.svc.cluster.local?uid={{.metadata.uid}}. Or support both, a mapping and a service. We don't need to decide this now, I think that's part of the brainstorming/experimentation phase, once we get past the "do we want to do this?" phase.
other things we'll need to discuss:
- security of the service to ensure only allowed components can call it
- how to deal with the resolvable service (or mapping configmap) changing the URL of the related resources
- how to deal with the resolvable service itself moving to a new location
other possible usecases include:
- event -> buildDefinitinon (e.g. Shipwright build definition)
- event -> workflowDefinition

Jul 14 '21 21:07 duglin

Thanks @evankanderson for looking at this. Answers:

The same way as today: the K_SINK URL is resolved by the SinkBinding controller which, under the cover, relies on URIFromObjectReference.
No.
It's not easy if the custom resource is out of our control and owned by a different community. The next logical step is indeed to have an adapter (EventTriggeredJob (or a parameterizable service)), which is perfectly fine but from an UX perspective might not be ideal. Instead of "a reference to a reference to an object", what about letting users referencing directly the object?
I'm not envisioning Knative to provide a reference implementation of an external kreference resolver. This proposal is just about providing an extension point.

Jul 14 '21 21:07 lionelvillard

Disclaimer: I'm 100% fine adding this as an experimental feature which is our vehicle for innovation with safety (P.S. according to our experimental feature you can already go ahead and provide an implementation as part of the buy-in phase and before kicking in the lazy consensus phase)
From what I understood, this is about providing a user experience that basically allows users to specify any arbitrary resource as a Sinkor a Subscriber given that such resource exposes a webhook at an arbitrary URI and does not necessarily expose such URI in its status.
So basically, this is asking to change the sources spec so that it's not limited to Addressable sinks.
One goal of eventing and Knative is to preach some of its concepts all over the K8s community which we believe:
- Provide a seamless UX with less cognitive load.
- Are more intuitive and easy to reason about.
- Facilitate better interops and integration between heterogeneous k8s applications.
One of those concepts is Addressable resources, a core building block and concept of Event Driven Architectures using Knative.
I worry that if we allow non-addressables, people would default to those and we'd lose any edge that comes with the Knative adoption to preach addressable resources, but one might argue the addressable restriction is an adoption barrier in its own. So it's a chicken or egg problem. I don't have an answer to that personally.
One solution that I was hoping our vendors would embrace, would be for the platform to provide middleware automation that gives their users a smoother UX while still adheres to and preaches Knative concepts.
Based on that, I wonder if instead of totally changing the spec to allow non-addressables, we can introduce a new concept of "addressable adapters" or "sink adapters", which means a source sink can be:
1. an addressable Ref
2. a URI
3. an addressable/sink adapter that refers in turn to a non-addressable resource and vendors can provide their own adapter controllers while providing their users with a very similar UX as the one proposed

apiVersion: my.batch.job/v1alpha1
kind: JobDefinition
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4 
---
apiVersion: sources.knative.dev/v1
kind: PingSource
metadata:
  name: trigger-job
spec:
  schedule: "@daily"  
  sink:
    adapter:
      apiVersion: my.batch.job/v1alpha1
      kind: JobDefinition
      name: pi

I can imagine there can be many ways to achieve the above, maybe a generic lower-level CRD such as SinkAdapter with class based controllers similar to Broker or Ingress and the default class being configurable cluster wide?
Such an approach IMHO would still preach the addressable sinks while providing users of our vendors with the needed smooth UX.

again, as I said I'm all for leveraging the experimental feature process to give this a shot, I'm just trying to surface some of our design goals we're trying to stick to and entertain other alternatives that provide needed UX.

Jul 15 '21 08:07 devguyio

I agree with @devguyio's sentiment. It feels like what you want is a resource in between, that can then implement whichever semantics you like (in your case calling some external service to fetch the URL the source should sink into) while also implementing the Addressable interface. Has that been considered?

Jul 15 '21 08:07 markusthoemmes

I haven't yet read fully through the issue, but my first question about the use case given:

Why not using a Kubernetes CronJob ? (and just out of curiosity, the JobDefinition resource looks very much like a standard K8s Job, is there any particular reason to not use a standard Job ?)
Besides the PingSource + custom Job use case (which I believe can be solved with standard K8s means) is there any other concrete use case (@duglin mentioned that the given use case is only one of many) ?

other possible usecases include:

event -> buildDefinitinon (e.g. Shipwright build definition)

event -> workflowDefinition

This is where we should try to influence the respective community to implement the Addressable schema in order to play nicely with Knative (and without requiring a custom resolver in-between which always looks like a duct tape approach)

When you have control over your custom JobDefinition resource, what prevents you from implementing the Addressable interface for it ?

Jul 15 '21 09:07 rhuss

Side note to CronJob vs. PingSource: Are we sure that our PingSource is hardened enough for edge cases ? Funnily enough, my very first Open Source project was https://metacpan.org/pod/Schedule::Cron which I supported for more than ten years (not anymore), and there were tons of edge cases that people found out and that needed to be fixed. Especially dealing with daylight saving switches is tricky and even in some situations non-deterministic (e.g. when turning back the hour from 3 to 2 and you have job scheduled for 2:30, when do you fire the job ? The "first" 2:30 ? The "second" 2:30 ? Both ?).

Still trying to find out the real use cases. Because when the real use case is to use PingSource as a full-featured cron-scheduler, then I would recommend that the PingSource should be checked if it is suitable for production (like checking those edge cases like those mentioned in https://metacpan.org/release/ROLAND/Schedule-Cron-1.01/source/t/execution_time.t)

If using CronJob there are potentially many more users and maintainers for fixing such issues.

Jul 15 '21 09:07 rhuss

Thanks guys for looking at this.

@devguyio @markusthoemmes: there is an intermediate data-plane object, but no corresponding CRD. See my comment above about double reffing.

@devguyio Agree with all you points expect:

The reference does directly expose a webhook. The intermediate data-plane object does.
SinkAdapter CRD since that's exactly what we are trying to avoid

@rhuss looping @cr22rc.

Jul 15 '21 13:07 lionelvillard

there is an intermediate data-plane object, but no corresponding CRD

@lionelvillard ack. that's my understanding as well.

The reference does directly expose a webhook. The intermediate data-plane object does.

I also understood the same, and it's what I meant by specify any arbitrary resource as a Sinkor a Subscriber given that such resource exposes a webhook at an arbitrary URI

SinkAdapter CRD since that's exactly what we are trying to avoid

My understanding is that you are trying to avoid making the user use an adapter CRD, right? and the example I gave above will do exactly that. The user doesn't have to know anything about a SinkAdapter CRD. The ´platform´ is the one that'll create that CRD for the user under the hood. This is basically the resolver service you're asking for but using k8s API (the CR is the request to resolve, and the CR status is the response with the URI) . There's no extra adapter data plane in this model.

Untitled (1)

Jul 15 '21 13:07 devguyio

@devguyio this is an interesting solution. I don't really like the idea of changing the Destination API. ref is my mind is just fine.

I really like @dug idea about the configmap that allows people to define a mapping between "Kind" and "URL-Template". Easy to implement, no external dependencies, no extra network hop (one of Scott's concerns). It also addresses @evankanderson question/concern about re-triggering the address resolution, which now becomes technically feasible.

Jul 15 '21 14:07 lionelvillard

I just realized this has an impact on DomainMapping /cc @julz

Jul 15 '21 14:07 lionelvillard

If we aren't as worried about an extra network hop, it also seems feasible to create a service which uses URLs containing k8s kind + name exposed in the Destination along with a special "resolver" object ref, where the resolver contains both the naming oracle and a proxy (or define 3xx behavior for Eventing as something other than a retry).

Note that the key behavior we're depending on for Destination (the thing that references a K8s Service or Addressable) is that the refed object will have a Watch-able change when the URL for a resource changes. (Technically, we have two of these watchable changes, one of of which is the Kubernetes DNS service, which provides a short-TTL DNS record, and one of these is the Addressable/Service abstraction).

Jul 19 '21 19:07 evankanderson

@evankanderson The current approach (kind->URL mapping in ConfigMap) preserves this key behavior:

When the referenced object is changed, a new URL is potentially generated since any objects can be tracked (modulo RBAC, not implemented in #5599)
When the ConfigMap containing the mapping Kind->URL is changed, all objects containing refs can be re-reconciled. (not implemented in #5599)

Jul 19 '21 21:07 lionelvillard

Just to speak it out loud (re @devguyio's suggestion): You could install a webhook that leaves the API intact and does the translation for you

ref:
  apiVersion: my.batch.job/v1alpha1
  kind: JobDefinition
  name: pi

becomes

ref:
  apiVersion: adapters.com/v1alpha1
  kind: MySinkAdapter
  name: pi

automatically and all interfaces can stay intact. That'd leave the interface to be purely K8s API based, which has some level of beauty to it, imo anyway.

Jul 26 '21 16:07 markusthoemmes

@markusthoemmes that's an interesting solution. However I don't like the idea of having the webhook changing the spec, too confusing. It's fine for the webhook to populate the defaults but nothing more than that IMO. It also makes exporting applied resources to yaml non-trivial.

all interfaces can stay intact.

that's nice to have but not a requirement.

That'd leave the interface to be purely K8s API based,

k8s or knative? ref is k8s, Addressable is Knative.

FWIW, the resolved sink appears in the status section so it's clear where events are sent

Jul 29 '21 22:07 lionelvillard

I'm not quite parsing in which way that's more confusing vs. having the resolution been done by some service that the user can't even inspect. In the outlined approach, the user would see exactly what's happening and can check the respective adapter to see what the actually resolved URL was (aka: Where their stuff is sent).

This is similar behavior to what istio sidecar-injection is doing as well too, so we're not out of bounds wrt. changing spec on apply. After all, the response the user gets from the apply would contain the mutations.

Jul 30 '21 09:07 markusthoemmes

having the resolution been done by some service that the user can't even inspect.

we moved away for this solution. Sorry I didn't update the issue description (will do).

the user would see exactly what's happening and can check the respective adapter to see what the actually resolved URL was (aka: Where their stuff is sent).

This is still the case with the mapping solution (see above): the user sees exactly where events are sent by looking at the resolved URL in the status section. The adapter may or may not send events. In the example above, the adapter creates jobs.

This is similar behavior to what istio sidecar-injection is doing as well too, so we're not out of bounds wrt. changing spec on apply. After all, the response the user gets from the apply would contain the mutations.

Is this just adding stuff or also changing the original spec?

Anyhow, one of the goals here is to hide (for the lack of a better word) the inner working. The user does not care about the adapter, that's just plumbing.

Jul 30 '21 14:07 lionelvillard

Just to make sure I understand, in the JobDefinition example, is it the case that if we wrote status.address.url to the JobDefinition in a controller (i.e. made it implement Addressable) things would "just work" without needing this? Putting aside whether doing that is a good idea or not, are there examples where something like that (or indeed Evan's suggestion of an EventingJobDefinition with the same semantics as JobDefinition but that implements Addressable) wouldnt fulfill the use case?

Jul 30 '21 15:07 julz

Just to make sure I understand, in the JobDefinition example, is it the case that if we wrote status.address.url to the JobDefinition in a controller (i.e. made it implement Addressable) things would "just work" without needing this?

yes

Putting aside whether doing that is a good idea or not, are there examples where something like that (or indeed Evan's suggestion of an EventingJobDefinition with the same semantics as JobDefinition but that implements Addressable) wouldnt fulfill the use case?

Yes for the use case and no for the desired UX. We would like to hide (in the spec, not in the status) EventingJobDefinition.

Jul 30 '21 15:07 lionelvillard

no for the desired UX

Just to push on this for a moment, in what way is the UX different in this approach? If we added a controller - or indeed a webhook - which placed status.address on JobDefinition (which I think we could do, and which could even be driven by a configmap), wouldn't the UX be identical, other than - to @markusthoemmes's point - the user could predict which URL would be used by looking at the JobDefinition, without having to wait for the sink's status to be resolved?

Jul 30 '21 15:07 julz

which placed status.address on JobDefinition (which I think we could do, ...

That's an interesting idea. It's not always possible though, since it's possible to define closed schema.

Jul 30 '21 16:07 lionelvillard

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

Dec 13 '21 01:12 github-actions[bot]

/triage-accepted

Apr 28 '22 20:04 lionelvillard

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

Jul 28 '22 01:07 github-actions[bot]

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

Feb 21 '23 01:02 github-actions[bot]

/remove-lifecycle stale

Mar 24 '23 08:03 pierDipi

/triage accepted

Mar 24 '23 08:03 pierDipi

eventing eventing copied to clipboard

[Experimental] Object reference resolver to support non-Addressable objects

eventing
eventing copied to clipboard