sensu-go icon indicating copy to clipboard operation
sensu-go copied to clipboard

[EPIC] Audit & improve sensu-backend logging

Open calebhailey opened this issue 2 years ago • 7 comments

Description

Sensu backend log output can be cumbersome for some users – especially so when embedded etcd is enabled. We need to audit/review/update/move (to different log levels) various log output from the sensu-backend.

Goals

  • Make log output more tunable by Sensu operators – ideally without restarting services
  • Establish criteria for ERROR, WARN, INFO, and DEBUG log levels
    • should WARN level log output be actionable?
    • should ERROR be reserved for matters concerning sensu platform reliability?
  • Establish criteria for log output that should become Sensu events
    • what type of pipeline configuration errors should end-users know about?
    • what logs could be "cluster events" (i.e. backend events published to the sensu-system namespace)?
    • what logs could be "namespace events" (i.e. backend events published to user-facing namespaces)?

Proposal

TBD

Spec

Coming soon.

Possible inspiration:

  • [ ] #3809
  • [ ] #4415

Issues

  • [ ] sensu/sensu-enterprise-go#2161
  • [ ] #4642
  • [ ] #4538
  • [ ] #4496
  • [ ] #4494
  • [ ] sensu/developer-advocacy#35
  • [ ] sensu/sensu-enterprise-go#1871
  • [ ] sensu/sensu-enterprise-go#1602
  • [ ] sensu/sensu-enterprise-go#1173
  • [ ] #2894
  • [ ] sensu/sensu-enterprise-go#249
  • [x] sensu/sensu-enterprise-go#248
  • [ ] #4703
  • [x] #4395
  • [x] sensu/sensu-enterprise-go#2298
  • [x] sensu/sensu-enterprise-go#2323
  • [ ] #4842
  • [x] sensu/sensu-enterprise-go#2270
  • [x] sensu/sensu-enterprise-go#2453
  • [x] sensu/sensu-enterprise-go#2475
  • [x] sensu/sensu-enterprise-go#248
  • [ ] #4842

calebhailey avatar Apr 22 '22 15:04 calebhailey

Quick'n dirty thoughts:

Every time I need to debug a pipeline (and it's resources), I need to see the raw event input which forces me to use the debug log level (extremely verbose). When a cluster is under significant load, debug logging is too costly (negatively impacts backend performance), I am forced to create a separate pipeline with a cat handler to get enough context. Many log events reference an event_id, this is fine when the event has been persisted (has a check), however, there is no record (without debug logging) of events that only contain metrics.

portertech avatar Apr 22 '22 16:04 portertech

Hey team! Please add your planning poker estimate with ZenHub @amdprophet @c-kruse @ccressent @echlebek

fguimond avatar Aug 10 '22 21:08 fguimond

@fguimond I'm a little confused about what we are meant to be estimating here. Everything in the epic?

c-kruse avatar Aug 11 '22 17:08 c-kruse

@c-kruse - Don't worry about it I must have selected it by accident.

fguimond avatar Aug 11 '22 18:08 fguimond

Need to add https://github.com/sensu/sensu-go/issues/4842 to this as well

asachs01 avatar Aug 19 '22 17:08 asachs01

Hey, we completed an issue from this epic in 6.8.0! That's progress! 🙌

I will roll this EPIC to 6.9.0 now and aim to complete at least one more issue in that milestone.

calebhailey avatar Aug 19 '22 21:08 calebhailey

@fguimond we need to add https://github.com/sensu/sensu-enterprise-go/issues/2453 to this. This issue was raised in a recent interaction.

asachs01 avatar Aug 29 '22 16:08 asachs01