foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

Document Severity-Levels

Open mpilman opened this issue 5 years ago • 4 comments

The severity-levels are declared here:

https://github.com/apple/foundationdb/blob/648fc8ec7c3c9f55e6c552f7b281db3c4acc40b4/flow/Trace.h#L45

However, there doesn't seem to be any explanation what they actually mean. I usually like the approach of defining it through production reporting. For example this is how I usually think about it:

SevError: Should never happen or something happened that might impacts availability of cluster. Needs immediate attention (for example page people).
SevWarnAlways: Means something bad happens that might need some human attention (for example a failed disk) but the cluster should be able to survive for another 12 hours or so. Create a ticket.
SevWarn: Something that might cause the cluster to not run optimally happened but it won't be actionable in the short term.
SevInfo: Everything else that is useful to have in production
SevDebug: Everything that might be useful for testing but shouldn't be logged in production.

Now I assume that this way of thinking is not the same as other people think of it.

I think at the very least there should be a clear definition as comment of how these levels should be used. Otherwise a contributor won't be able to use these Traces in a consistent way.

mpilman avatar May 01 '19 21:05 mpilman

As you probably remember, there is some discussion here that talks about what the severity levels currently mean:

https://forums.foundationdb.org/t/why-is-a-sev-40-if-ilistener-accept-throws-an-error/663

I agree that these should be more formally documented. Roughly, they are currently something like:

SevError - a condition that leads to the running process terminating (or maybe just the role terminating, I think there's a small amount of variability in how these are used) SevWarnAlways - a non-fatal condition for the process that is nonetheless useful to be aware of (i.e. this severity is one that could be monitored) SevWarn/SevInfo - 2 priorities for events that we don't expect to be monitored but that are interesting for production SevDebug - Events that don't need to be logged in prod.

ajbeamon avatar May 01 '19 21:05 ajbeamon

I remember the discussion. I still don't agree with you here but this is not the point ;)

I would just like to have a comment in Trace.h that documents this. A forum-thread about severity of listeners running out of file descriptors is probably not the right place to have this documented.

mpilman avatar May 01 '19 22:05 mpilman

Sure, mainly I'm adding this information here for the benefit of whoever does the documenting.

ajbeamon avatar May 02 '19 15:05 ajbeamon

Hi, guys FDB only prints logs above the log level SevInfo by default, how to open the log level SevDebug logs?

TSP-wengle avatar Aug 12 '22 02:08 TSP-wengle

you can pass --knob-min-trace-severity=1 or set this knob though a network options (for client side logging). We currently don't officially support changing the log level. This is because setting the log level to anything larger than 10 will cause weird problems (this is imho a bug). But setting it to something smaller than 10 shouldn't cause issues (apart from a small performance regression due to more logging)

sfc-gh-mpilman avatar Aug 12 '22 15:08 sfc-gh-mpilman