caddy System for selectively omitting logs, on a per-log basis

This issue is to collect and gauge interest in this feature before we spend time working on it.

There's demand for having Caddy skip logging access logs selectively; I opened issue #4689 which should solve the specific case of access logs. Caddy doesn't only emit access logs, though. There's all kinds of logs, from many different sources (other Caddy apps, stdlib captured logs, dependencies).

The idea is that we could implement a system for configuring "matchers" for individual log messages based on the contents of the zap log entry. This could be hooked in implementing a custom zapcore.Core to decide whether to write the log.

We actually already do implement a custom Core to override Check() to implement an include/exclude list by logger name:

https://github.com/caddyserver/caddy/blob/afca2421112eb89e435888f50f8146364a3c60a4/logging.go#L550-L557

This is a good start and may usually be enough for most users to hide certain logs, but it's possible that sometimes only specific patterns of logs from a particular logger should be hidden, not the entire logger.

My idea is that we add support for pluggable log matcher modules which take in the zapcore.Entry and []zapcore.Field and can inspect its contents to decide whether we want to log it. It would go in the config in here https://caddyserver.com/docs/json/logging/logs/ (i.e. just under include and exclude). Not sure what we'd call it, maybe checkers? It would be an array of modules that implement an interface like:

interface LogEntryChecker {
	ShouldSkip(zapcore.Entry, []zapcore.Field) bool
}

We'd have to run this checking logic in Write(Entry, []Field) error since that's the only place we have access to both the entry and the fields. This was suggested as a solution in https://github.com/uber-go/zap/issues/1021#issuecomment-962119581 as an answer to a similar question.

We could provide simple built-in matchers like an exact value match of a particular field by path, like request>method (similar to filter encoders, a not matcher which wraps other matchers to invert the boolean result, etc.

So all that said, this is not an insignificant amount of work, so we'd like to hear what problems this could solve for users. Feedback is welcome on this idea!

Apr 08 '22 18:04 francislavoie

Thanks Francis. This would help get rid of the noise in the logs. I use Caddy to run a blog, and every page can initiate several requests when a user lands on it (css files, scripts, images). I don't really need these information in my logs.

Also, since every page visit can shoot 5 or more requests, the log file is unnecessarily growing in size. So I can end up with thousands of logs, but I only need 20% of them.

Having the ability to ignore certain access logs would help reducing the noise and saving space for what really matters.

Apr 08 '22 19:04 AliAlhajji

To be clear @AliAlhajji, that is answered by https://github.com/caddyserver/caddy/issues/4689.

This issue is specifically about other kinds of logs that are not access logs.

Note the second paragraph in the issue description.

Apr 08 '22 19:04 francislavoie

Hey @leodido as the author of https://github.com/leodido/caddy-conditional-logging do you have any thoughts about this proposal?

We can also implement expression matching using CEL like we already do for request matching. https://caddyserver.com/docs/caddyfile/matchers#expression

Apr 09 '22 22:04 francislavoie

Any progress on this? It is completely impossible to follow the logs with the spam I get from "aborting with incomplete response" (#4500) followed by a large JSON spanning multiple rows.

Nov 22 '23 19:11 abjugard

@abjugard I'm of the opinion that filtering logs by and large is a job for the ingestor/consumer, not the producer. The more logic we do around logs the slower it will become and the more complicated things will get.

Ideally whatever you're using to view your logs lets you filter out unwanted entries.

Maybe there's room for a Caddy app that ingests its own logs, I dunno. 🤷‍♂️

Dec 13 '23 11:12 mholt

The more logic we do around logs the slower it will become and the more complicated things will get.

Ideally whatever you're using to view your logs lets you filter out unwanted entries.

I am not sure the efficiency argument works out as while I do centrally collect, process and filter those logs, it is certainly more computational effort to do that than to limit the amount of logs produced in the first place and ideally I would want to do both to avoid serializing, transmitting, storing, querying and filtering logs I am not interested in in the first place.

(I could certainly write a local filter wrapping Caddy's stderr to limit what goes into the system journal, but even this means one additional process boundary and parsing step which is most likely more expensive and than logic to suppress writing those log messages in the first place.)

Personally, I am somewhat fond of the module-based log levels provided by e.g. Rust's env_logger as a nice compromise between something involved and generic and something simple and performant which I think is basically similar to the approach based on the logger name initially suggested by the OP.

Dec 13 '23 11:12 adamreichold

@abjugard I'm of the opinion that filtering logs by and large is a job for the ingestor/consumer, not the producer. The more logic we do around logs the slower it will become and the more complicated things will get.

Ideally whatever you're using to view your logs lets you filter out unwanted entries.

Maybe there's room for a Caddy app that ingests its own logs, I dunno. 🤷‍♂️

I'm just shoving them into systemd-journal, the same as most I would imagine.

I get your point and for enterprise systems I agree that there's probably some other software in place for receiving and monitoring logs, but for homelabbers there probably isn't, and whether you like it or not your product is very popular with homelabbers.

I also can't stop myself from thinking that surely "not logging something" can't add all that much overhead can it? Sure, allowing configuration per-site means that whenever logs are produced some site-specific configuration must be resolved and that's objectively more CPU cycles and memory IO than not doing it, but there's no disk IO wasted saving such logs, which should in most circumstances be a net positive in performance-metrics.

While site-level configuration would be preferable, I would personally be satisfied being able to globally change the level certain errors are logged at, like for example these "aborted with incomplete" errors are something that comes from the client terminating the connection uncleanly, they are not something I care about on this system, and would value being able to mark any client-caused errors at a different log level that I wouldn't necessarily output to the logs at all, like info or even debug.

Dec 13 '23 12:12 abjugard

Any news on this?

May 03 '24 14:05 knoefel

@knoefel no, this issue is to gauge interest. Please don't +1, give actual feedback. Why would this be useful for you? What would you do with it?

May 03 '24 15:05 francislavoie

@francislavoie Thanks for your reply. We recently added caddy and it's working great as a reverse proxy for next application. But similar to @abjugard we noticed a lot of logs in development where it is hard to find actual meaningful ones from next due to the "aborting with incomplete response" errors. Our current workaround is to disable logs via the config, but we're not really happy with this solution as it swallows potentially important logs. Maybe there is already a better solution for this that I'm not aware of - in that case I'm sorry for spamming this thread.

May 03 '24 15:05 knoefel

@knoefel Your log consumer should be able to ignore the logs that it is not interested in, simply by dropping the bytes.

May 03 '24 15:05 mholt