k0s icon indicating copy to clipboard operation
k0s copied to clipboard

K0s log handling discussion

Open juanluisvaladas opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe.

We discussed a bit #3122 and we decided to start a clean discussion in order to decide exactly what we want to modify and what we want to keep.

There are a few problems noticed: 1- The documentation is bad. We need to document how can it be configured exactly, log level per component, etc. 2- Are our defaults sane? 3- Does it make sense to havee a single field in the configuration to manage every component? 4- Should we use journald.Send to send different logs to different locations? 5- Since we're doing the change and logrus is in maintenance mode, should we consider moving to a different library? slog is an option, zerolog, etc.

Also we don't want to handle log rotation and stuff like that so we want to keep our problems fairly simple.

juanluisvaladas avatar May 24 '23 11:05 juanluisvaladas

I am about to go on vacation/holiday for week so let me just do some inital chime on these.

  1. The documentation is bad. We need to document how can it be configured exactly, log level per component, etc.

I agree and disagree. The organization of items in the documentation is fine. The simple video on the front page of "setting" up a k0s cluster is fast, easy, but when getting into the weeds of things, for the really basic newbie, it needs some more hand holding. I think there is a better way to organize all the files instead of all these markdown files in the core projects and also among the k0sctrl project. Maybe a dedicated site/repo to all things k0s.

What really threw me off for a while, before i read up and found the LB section in more depth is that users will be excited to say "Hey. We can run an HA k0s! Cool! Lets spin up two controllers and party on!" Well, err. Need that LB setup first. So I thinkt he documentation needs a little guide to show paths which way users are going to setup. I read only on online artcile out there on someone migrating the k0s to a HA system. I think they were doing it manually instead of k0sctl, but still need that LB setup which was left out of his article.

  1. Are our defaults sane?

I think defined server specs, also in terms of default HDD and Linux partitioning should be looked at. 16gb default image for Ubuntu didn't help me, I had to bump up to 80gb to deal with the syslog issue prior to to solving the LB issue.

  1. Does it make sense to [have] a single field in the configuration to manage every component?

I am more of a config per component field. I been acticitly working on a Chef Automation Recipe for that keeping the system in compliance thing running and even including automation the k0sctl part as new "nodes" could come online excluding HA controllers setups as the NLB has to be setup prior. But besides that, having a per component config seems to be a better idea. (my chef receipe makes sure the ssh keys are all the same that are known by the k0sctrl box)

4 & 5 combined...

I for one wouldn't want to mess up any default log systems that already monitor kube environments, but using journald and having it go into a dedicated k0s log file and not like k0s-kube-router, etc. might be fine enough. This way the sysadmin (like myself) could setup my system to rotate the logs without issue and not be forced to do what k0s does. But I think some initial default size limits like 10gb files is better than the 40gb files I was dealing with once separated from 'syslog'

Thanks, Shane

Bugs5382 avatar May 25 '23 18:05 Bugs5382

1- The documentation is bad. We need to document how can it be configured exactly, log level per component, etc.

I think per component configuration is too fine grained and can be left to some third party log analyzer thing.

2- Are our defaults sane?

I think there's a bit too much happening on INFO, things like:

  • logrus.Info("no changes detected for konnectivity-server")
  • logrus.Infof("starting to count controller lease holders every 10 secs")
  • logger.Info("reconcile has nothing to do")
  • a.log.Info("CSR Approver context done")
  • c.log.Infof("current cfg matches existing, not gonna do anything")

I think going through .Info in source and judging if they're actually information or debug messages would be a good start.

3- Does it make sense to havee a single field in the configuration to manage every component?

I think yes as long as it's structured and there's a field for the component. Then it's up to the user to filter using whatever.

4- Should we use journald.Send to send different logs to different locations?

My understanding is that there's only one place in journald. It does have the SYSLOG_FACILITY field for backwards compatibility or whatever, but journald itself does not have multiple facilities like "kernel", "auth", "user" etc.

But it would make sense to log to journald directly when started as a systemd unit because then the logs would become structured and you could use journald's query filters to look for stuff.

Logging only to journald would make journald a requirement.

5- Since we're doing the change and logrus is in maintenance mode, should we consider moving to a different library? slog is an option, zerolog

The journald lib recommends the authors own "logf" package for interoperability, but it doesn't look too lively. A quick search does not find any slog-journald adapters, but writing one shouldn't be a huge undertaking. Zerolog comes with a journald writer.

kke avatar May 29 '23 07:05 kke

I've been looking into this a bit further.

Logrus is known to be quite slow, but even being slow it doesn't take a significant amount of time. In fact in some scenarios it wouldn't show at all in the profiler traces, meaning its overall impact is very small and even if we could optimize it to spend 3 orders of magnitude less time it wouldn't have a significant impact.

Right now we have a lot of logging libraries built-in in our binary: 1- github.com/sirupsen/logrus 2- k8s.io/klog/v2 (indirect, used by kubernetes component) 3- go.uber.org/zap (only used for etcd backup) 4- github.com/bombsimon/logrusr/v4 (a logr interface for logrus) 5- github.com/logrusorgru/aurora/v3 (color library for logrus) 6- github.com/containerd/log (apparently used by airgap, probably as a consequence of github.com/containerd/platforms) 7- github.com/go-logr/stdr (used in multiple places)

I don't see an easy way to avoid taking so much disk space, I made a few quick tests and zap is only taking around 2MB in the whole binary size (267mb at 1.33) so even if we'd manage to do a bunch of replaces in go.mod pointing to a much lighter interface it wouldn't be a significant improvement for the effort it would require...

juanluisvaladas avatar May 15 '25 11:05 juanluisvaladas

Concerning the excessive logging reported in the original issue: I think it's sort of expected to see a lot of error logs in situations where the cluster is in a very bad state. We could discuss if we want to focus more on metrics and error counters in the future, so that folks can silence their logs and rely on telemetry for this. However, let's focus on the logging part right now.

I see two major parts in here which are only very loosely connected to each other:

  1. The technical implementation
  2. How k0s uses logs

Concerning the technical implementation:

Right now we have a lot of logging libraries built-in in our binary: 1- github.com/sirupsen/logrus 2- k8s.io/klog/v2 (indirect, used by kubernetes component) 3- go.uber.org/zap (only used for etcd backup) 4- github.com/bombsimon/logrusr/v4 (a logr interface for logrus) 5- github.com/logrusorgru/aurora/v3 (color library for logrus) 6- github.com/containerd/log (apparently used by airgap, probably as a consequence of github.com/containerd/platforms) 7- github.com/go-logr/stdr (used in multiple places)

klog is (obviously) used by all the Kubernetes stuff, zap by the etcd client, and logrusr by controller-runtime. The containerd/log and go-logr/stdr packages are not anywhere within k0s itself, they're some transitive dependencies, probably both from containerd. So we have to take those for granted, as there's no way those will go away. Aurora, OTOH, is not concerned with logging at all, its for ANSI terminal control sequences, currently only used by k0s sysinfo. I think logrus will use it as well, if it detects that stdout is a terminal.

I'd have the following wishes about k0s logging:

  1. There should be a frontend (i.e. some sort of facade) that k0s can use to issue structured log statements. It should be convenient to use.
  2. There should be some backend that performs the actual logging.
  3. The backend output format should be flexible, i.e. the log format, JSON, Windows event log, potentially journald ... Not saying that k0s should support all of these, but the backend should at least not be an obstacle to add those in the future.
  4. It should be straight forward to bridge the logging facades used by k0s's dependencies to the k0s backend, so that the k0s log output config applies to everything k0s uses.
  5. Performance overhead shouldn't be terrible. We don't need light speed, logrus is definitely a slow option to begin with, but performance should obviously not be worse of what's used right now.
  6. Would be probably future proof if the log library could be integrated with tracing, although that's not really a priority at all.

The only part that we can get rid of is logrus (although I bet it will still idle around as a transitive dep of some other lib, didn't check 🙈). As a facade, k0s could use logr (just as controller-runtime), and zap as a backend (just as etcd). As far as I can tell, nothing in k0s uses slog so far. While I'm not saying that slog wouldn't be an option, I'd probably just stick what others are doing, unless there are compelling reasons not to do so.

I guess that zap will have all the bells and whistles for all the other potential improvements, like filtering, different output options.


Now the second part:

I think we need to come up with some (not too long) guidelines on how k0s should log stuff. What to log when at which level. Style conventions about log messages. Conventions about how k0s should do structured logging (which fields to log, what's the naming convention.)

Then we can discuss the defaults, the supported backends, and how to configure this. And of course fix all the current ways of k0s logging that are not in line with the guidelines.

twz123 avatar Jul 09 '25 13:07 twz123