cartographer Surfacing information to users

Problem

Platform builders want to surface information to users of the platform.

Example

A platform builder may build supply chains that support a variety of underlying runtimes for web applications. For example using default k8s Deployment type or on a specialized runtime like Knative. In either case the platform builder may want to surface the URL that should be used to access the web service. Today platform tools would have to be aware of the resources created by the supply chain and know to look in a variety of locations. It would be useful if there was a canonical location that platform builders could look for any info they want to surface to users.

Considerations

There is ongoing work to surface information about the resources that a supply chain stamps out on the Workload status, including the "healthiness" of the resource. This makes sense for developer users since the Workload is their interface for the system. However in the example above the URL might more align to a runtime cluster that has a Deliverable rather than a Workload. This would be true for other runtime information a developer might wish to know like health and number of replicas.

Mar 31 '22 17:03 zrob

From my point of view, having the information readily available from the workload resource would be ideal. I.e. if you want to know the 'clickable url' for a given workload, you get the workload's yaml or json representation via k8s api or kubectl get -o json and the info should be in there (e.g. today we can find the url in the knative service resource's 'status' field, if we could find it in workload.status instead, that piece of code for 'finding the clickable url' would be much less fragile).

This is just my 2 cents, feel free to ignore it completely. (Because I don't know much about the technicallities / internals of cartographer and how you'd make something like this happen or of it is practically feasible at all. So I don't really know if this is a good idea or not.)

Mar 31 '22 17:03 kdvolder

This might be tricky for cartographer to achieve. It only knows about resources that it stamps out, not the underlying children. This is especially tricky on the runtime cluster where all the resources typically get bundled together as a config blob, and applied through a kapp App. In this instance cartographer doesn't know anything about knative.

Mar 31 '22 17:03 jwntrs

Arguably if it is tricky, then... all the more reason you do not want everyone looking for that piece of information to implement their own fragile implementation of solving this tricky problem. So hopefully you can come up with some way to make this a reality.

Mar 31 '22 17:03 kdvolder

The apps cli today will query Knative Services that match a label, the URL is then read from the status of those resources. If there are no Knative Services, or those services exist on a different cluster, the URL will not be discovered by the cli.

Mar 31 '22 18:03 scothis

There is no workload on the runtime cluster so no workload.status to inspect for the URL. For env's where there is a workload, can cartographer provide the URL as part of the workload.status reliably (regardless of which runtime the workload lands on)?

Mar 31 '22 18:03 heyjcollins

There is no workload on the runtime cluster so no workload.status to inspect for the URL. For env's where there is a workload, can cartographer provide the URL as part of the workload.status reliably (regardless of which runtime the workload lands on)?

The Deliverable is actually what's responsible for creating the resources defined by the Workload/SupplyChain. I don't think we should surface information on the Workload from the Deliverable, even when they are both defined in the same cluster. We need to decouple the Workload and Deliverable resources to have a sane multi-cluster story, while the ask is to further entangle the two resources. Users and clients should become more aware of the Deliverable.

What information to surfaced on the Deliverable status is a conversation worth having. As @jwntrs points out, this is further complicated by ClusterDelivery often delegating to a kapp-controller App to manage the individual resources.

Apr 01 '22 18:04 scothis

I don't think I understand most of the technical arguments I read above.

Anyhow, the request from me is really not to push for making the information available on the workload.status. That would be convenient, sure, but any solution that allows a user or client to reliably answer the question "given workload X, what is the public URL (if any) associated with that workload" is really acceptable to me.

The point is really, that this is a high-level question that any user/tool might ask and they should be able to get the answer easily and reliably. I think that if that's not possible then that is a problem that needs to be solved somehow. Saying that it is 'too hard' isn't a very good argument for not solving it. (That just means hoisting the responsibility of solving the problem onto various other tools such as tanzu cli or vscode-tanzu tools, or... even worse, the end users themselves).

I realize answering the question reliably is hard for various technical reasons (the details of which are a bit lost on me). But I think it boils down to the fact that when someone creates a workload resource, that sets of a complex chain of causally connected events, ultimately this is intended to 'publish' a running app on a public url.

So, there is a causal connection / relation between the workload and the url in the user's mind. Clearly the url exists only because the workload caused it indirectly. These things are therefore not completely 'disconnected'.

The arguments I read above, (disclaimer: as far as I understand them :-) seem to boil down to saying:

"it is hard to propagate information backwards on this causal chain"
"we don't want to propagate information backwards because thing X and thing Y should be 'decoupled'.

I grant that it is hard, I'm sure that's true. I also grant that this kind of 'decoupling' is desirable.

However I disagree that this decoupling means we shouldn't provide the tools to understand how they are connected.

To make the point clear. I am thinking of an analogous argument that sounds 'false' to me. Consider procedures calls in a programming language.

We do want to design interfaces/apis such that callers are 'decoupled' from the 'callee' side.

However this does not mean that we don't want to be able to get a stacktrace when something goes wrong in the 'callee'.

The ability to connect errors / stacktraces back to an 'original caller' through a long chain of calls is clearly something we do want, and it is something every programming language runtime works hard at giving you.

Similarly when a user asks 'is there a problem with my workload?' and 'which exact component / resource / cluster does that problem originate from?', should be questions our tools/users should be able to answer. If we design things in such a way that answering these questions reliably is not possible or very hard, then I think that is a problem.

As @zrob also mentions in the original ticket description. The "understand my workload's healthiness" and "what is my workload's URL" are similar questions in that both of them require understanding the structure of 'the causal chain' that connects all the 'stuff' that a workload causes to happen/exist.

Apr 06 '22 18:04 kdvolder

related: https://github.com/vmware-tanzu/cartographer/issues/829

Apr 29 '22 18:04 cirocosta

In either case the platform builder may want to surface the URL that should be used to access the web service

In some cases, the application may not surface a web interface, or a web interface may not be the primary way that work is submitted to the application (two examples are alternate protocols like MQTT or SMTP and pull-based queues like RabbitMQ or Kafka). Additionally, some workloads (both Knative and otherwise) may only be accessible from within the cluster, so "surfacing a URL" may not be sufficient for an end user to interact with the application.

As a simple example, consider an application which is deployed as a GRPC+HTTP (i.e. "web") application as a cluster-internal service using a Deployment + a K8s Service of type: ClusterIP. In order to access the web interface of this application, it's necessary for the user to run kubectl port-forward or run a container on the application's cluster to be able to reach the application. As a simple example, the goldilocks tool uses this to require k8s credentials to be able to see the cluster sizing recommendations: https://goldilocks.docs.fairwinds.com/installation/#viewing-the-dashboard

A second example is that a service may only be accessible at an external URL by combining a resource which already exists on the cluster with the delivered application. An example of this would be creating a Kubernetes Ingress on the cluster which references a Service delivered through a cartographer Workload. The URL would be determined by the Ingress, which might look like the following:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
spec:
  rules:
  - host: example.com  # This might be different per-cluster
    http:
      paths:
      - path: /app  # Endpoint for my-app would be http://example.com/app
        pathType: Prefix
        backend:
          service:
            name: my-app
            port:
              number: 80

May 04 '22 21:05 evankanderson

In some cases, the application may not surface a web interface, or a web interface may not be the primary way that work is submitted to the application

Those are valid points, but I think that merely means that it doesn't make sense for all workloads to advertise a URL on which they can be reached. There are however still many workloads for which it does make sense and it is for those workloads that we want to be able to obtain the url relatively easily (e.g. by reading 'status.url'). (And for those workloads for which the concept of 'url on which you can be reached' doesn't make sense... then that could be indicated by setting 'status.url" field to some value like 'false' or 'nil' or something like that (i.e. a special value meaning 'not applicable for this workload').

May 05 '22 00:05 kdvolder

cartographer cartographer copied to clipboard

Surfacing information to users

Problem

Example

Considerations

cartographer
cartographer copied to clipboard