syzkaller icon indicating copy to clipboard operation
syzkaller copied to clipboard

RFC: pkg/log: separate error message and arguments?

Open a-nogikh opened this issue 2 years ago • 3 comments

To set up actionable alerts, we need to de-duplicate error messages, and for that we need some format convention.

I propose to stick to a format similar to what is actually already used in most syzkaller code:

2023/07/11 11:52:01 ci2 [INFO] error message: details

What's before the first : is to be treated as a unique error identifier (errors can be merged together by it), what's after -- as details which are just important for debugging.

In some cases, we do want to include some details to the unique part: 2023/07/11 11:52:01 ci-qemu2 [ERROR] manager ci-qemu2-arm32 exited unexpectedly: exit status 1

In other cases, we'd rather merge everything, so the part before : stays constant. 2023/07/11 11:52:01 ci-qemu2 [ERROR] failed to mark job as done: query failed with an error / query timed out / ...

Many popular Go logging libraries follow fully structured logging approach, but I think it'll only complicate things in our case. For our purposes, it's enough to be able to extract timestamp, instance, message level, unique message, details.


Ideally, this separation should be enforced by some static analysis, but I don't see a 100% reliable way to do so.

We could use func Errorf(err string, details string, args ...interface{}), but it doesn't prevent bugs like log.Errorf("%s failed", someString).

So maybe just don't enforce it at all? In the worst case, we'll get either:

  • A too generic alert message that combined a lot of errors into one.
  • Too many alerts that should have been combined together.

Both are easily noticeable and can be fixed by a small code change. Over time we'll bring this to order.


All above is only relevant for [ERROR] and [FATAL]. Ordinary log.Logf ([INFO]) messages are not to be automatically parsed and analyzed anyway, so we can keep on using the simple log.Logf(level, msg, args...) format.

Cc @dvyukov

a-nogikh avatar Jul 11 '23 18:07 a-nogikh

Sounds reasonable.

Any help to enforce this would be useful. Function signature with different arguments for const/non-const part will foce anybody writing code to think about this at least. Without different function signature external contributors won't be even aware of this. A linter check would be useful as well.

We could use func Errorf(err string, details string, args ...interface{}), but it doesn't prevent bugs like log.Errorf("%s failed", someString).

This is easy to catch: no '%' in const err string. Also we want: no ':' in const err string. Also: should always be a string literal. Also: the details string must contain '%' and additional args.

dvyukov avatar Jul 12 '23 08:07 dvyukov

This is easy to catch: no '%' in const err string. Also we want: no ':' in const err string. Also: should always be a string literal. Also: the details string must contain '%' and additional args.

Would we be able to control it a) Statically b) Indirectly: we sometimes use intermediate logging functions, like here

https://github.com/google/syzkaller/blob/bfb20202da73451f997b711447ee68bc3635868e/syz-ci/jobs.go#L765-L767

?

There's definitely some way, of course. The question is whether we can do it with reasonable effort :)

a-nogikh avatar Jul 13 '23 09:07 a-nogikh

We already control it statically including indirect calls (jp.Logf): https://github.com/google/syzkaller/blob/bfb20202da73451f997b711447ee68bc3635868e/tools/syz-linter/linter.go#L299

If we don't check it statically, do you think such bugs won't be introduced? I would assume they will be introduced all the time and we will always catch them when it's already too late (they resulted in wrong monitoring signals). It may even be hard to do the initial conversion reliably.

dvyukov avatar Jul 13 '23 09:07 dvyukov