eventing icon indicating copy to clipboard operation
eventing copied to clipboard

simpler observability for Knative eventing system components

Open aslom opened this issue 4 years ago • 11 comments

Problem no easy way to observe what is going on in Knative eventing system components, users may not have to system namespaces where knative eventing is installed

Persona: Which persona is this feature for?

Event consumer (developer) System Integrator Contributors

Exit Criteria User can oobserver Knative eventing system components

Time Estimate (optional): How many developer-days do you think this may take to resolve? 1-\inf

Additional context (optional) Add any other context about the feature request here.

This came up during Source WG discussion

@lionelvillard @n3wscott @cr22rc @nachocano @lberk @grantr @bryeung

aslom avatar Jun 23 '20 16:06 aslom

We should explore producing cloudevents in response to milestones in progress on our control and data planes.

These could optionally be sunk to Kubernetes Events, or to a Broker and then directed to a namespace for a user to observe errors in their multi-tenant control planes without leaking secret info.

It is an interesting thing to explore.

n3wscott avatar Jun 23 '20 16:06 n3wscott

More on problem: when Knative is installed some of its components may be in system namespaces such as mt-broker or sources link how do we get logs (and other observability) to user namespace?

For getting logs: https://github.com/knative/eventing/issues/3299

aslom avatar Jun 23 '20 16:06 aslom

One idea was to make system logs and events into CloudEvents that are then routed to user namespace @n3wscott ?

aslom avatar Jun 23 '20 16:06 aslom

I was thinking that may work great if users can create in user namespace special Knative Eventing Observability CR for diagnostic and do not need ot run multiple logs or describe or install additinal tools or require system namespaces permissions:

apiVersion: eventing.knative.dev/v1alpha1
kind: Observability
metadata:
  name: diagnostic
# additional option to what to observe, filtering etc. - has reasonable defaults

after applying CR in user namespace

kubectl apply -f knative-eventing-observability.yaml

then user can describe created object and see status of knative eventing in their namespace, k8s events etc.

kubectl describe eventing.knative.dev diagnostic

For good user experience there may be also special pod created that gathers logs from system namespaces and makes them available in the pod in user namespace:

kubectl logs diagnostic--XYZ-123

And when done simply do cleanup (and avoid overheads of diagnostic observability):

kubectl delete -f knative-eventing-observability.yaml

aslom avatar Jun 23 '20 16:06 aslom

Note that we also want consistency with Serving and I'm not 100% sure a hard dependency on CloudEvents is the right direction but worth a try.

The key point here is to agree on the need to produce diagnosis events in our data planes, other than metrics. How it's implemented is a different question.

@dprotaso @mattmoor @mdemirhan thoughts?

lionelvillard avatar Jun 23 '20 17:06 lionelvillard

How do you see this being related to #3299 ?

As an end user, I think the most important thing to see are errors. Ideally, I think those should be associated with the entity the error is about from the user's POV. For example, an error with the github source/adapter should probably be related to (seen thru) the Github CR. A more generic error reporting thing (e.g. logDNA) might be useful, but I tend to think of those as deeper analysis tools that, if people really want, they can setup. And I kind of view the idea of generating CEs with different sinks in that category... more advanced. But for the simple use cases I think most people would prefer to look at the CR and so I'd prefer to solve that one first.

duglin avatar Jun 24 '20 15:06 duglin

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Nov 23 '20 01:11 github-actions[bot]

/reopen Being discuss in for GSOC

csantanapr avatar Apr 01 '22 01:04 csantanapr

@csantanapr: Reopened this issue.

In response to this:

/reopen Being discuss in for GSOC

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow[bot] avatar Apr 01 '22 01:04 knative-prow[bot]

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Jul 01 '22 01:07 github-actions[bot]

/remove-lifecycle stale

aslom avatar Jul 01 '22 12:07 aslom

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Sep 30 '22 01:09 github-actions[bot]

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close

/lifecycle stale

knative-prow-robot avatar Oct 30 '22 02:10 knative-prow-robot

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Jan 30 '23 01:01 github-actions[bot]