seL4 icon indicating copy to clipboard operation
seL4 copied to clipboard

Add kernel log for debug tracing

Open xurtis opened this issue 5 years ago • 7 comments

This extends the existing benchmarking log and logging API for debugging purposes. It provides a general set of tools for describing and logging kernel events of varying size to a memory frame visible at user-level.

The intent for this PR is to enable new logging events to be added to trace kernel behavior in debugging situations in a manner that allows multiple different kinds of events to be traced together and to make it easy to add log events to the kernel source while debugging it without having to fully determine a new messaging format each time.

Rather than adding a new log buffer and a new set of syscalls, this PR adds a kernel configuration option that allows the benchmark log buffer to be enabled and configured in non-benchmark debug configurations and a messaging format with variable-length messages to the logs.

The nature of the messaging format prohibits existing messaging formats in the log buffer from being used at the same time, so this PR also implements a set of entry tracking messages that can be used along with any other added messages in the newer message format.

Much like the existing log message formats, the new format with messages of varying size only ever requires the kernel to write to the log region itself and always writes in a linear fashion through the log region so should demonstrate similar minimal use of the cache, although the number of cache lines accessed for each message write may be greater due to the lack of cache-line alignment on messages.

The impact of the new message format on system performance has not been checked and so logging in performance-sensitive cases should continue to use the existing benchmark logging APIs. It may well be that this new log format has sufficiently low overhead that it replaces the existing log buffer message formats, although removing those formats should be considered in a separate PR.

This change does not resolve the existing issues of the kernel log buffer for SMP configurations whereby all cores access the same log buffer region and the same global internal kernel state for tracking the kernel's write position in the buffer. The easiest solution for that problem would be to maintain a distinct cursor for each core and to divide the region into a set of smaller per-core logs. It would then require user-level to reset each core's log separately by some change to the syscall API. This fix is out of scope of this PR but could be implemented and proposed in a separate PR.

A complimenting set of tools will be added to seL4_utils for managing and exporting the log-buffer at user-level for offline analysis. seL4/seL4_libs#25 demonstrates how this would be used at user level.

xurtis avatar Feb 07 '21 22:02 xurtis

Can you please provide more context? AFAICT this PR does the following:

  • Adds a new type of log buffer for use debugging, which is more flexible as it doesn't care about performance.
  • Duplicates (?) the kernel entry tracking with this new debug log buffer.

IMO any debug tracing should be fairly performance sensitive as it is likely going to be used a lot for debugging, and debugging a system with different timing and cache behaviour is not fun as behaviour can diverge too much from the non-debug system.

Also, how does this work for multicore?

pingerino avatar Feb 07 '21 23:02 pingerino

I've updated the PR description above and added a complimentary user-level PR to seL4_Libs for further context. As for current use cases, I've been using this to get detailed tracing of scheduler decisions to get a more precise trace of the MCS scheduler behavior, I also intend to use it for tracing kernel behavior in scenarios where the existing logging features don't provide sufficient detail on scheduling behavior.

I didn't initially implement this with the intent of it being particularly performance sensitive, however I have revised it to hopefully be as minimal an overhead to performance as possible. The addition of the entry tracing here was simply because I wanted to still get that trace at the same time as also getting other logging events, and the existing trace logging could only be used if it was the only message format in the log buffer.

I can run some numbers and see what the overhead is; the current implementation may be similarly low overhead to the existing message formats and, as it uses the same mechanism for the log buffer itself, it also benefits from the mapping and cache configurations used for the existing benchmark log buffer where platforms support specifying write-through caching in the mapping (risc-v being the notable exception).

xurtis avatar Feb 08 '21 00:02 xurtis

What do we do with this one? @xurtis I assume you're not going to be able to finish it.

The PR as such seems close, and could be completed, but I'm unclear on how needed the functionality is. If nobody is exercising this code, it'd be better to not merge and close the PR instead. It'd still be available for re-opening.

Any opinions?

lsf37 avatar Sep 13 '21 04:09 lsf37

Actually, I very interested in this feature and keep advertising this to colleagues. There are just not enough resources to work on this at the moment.

axel-h avatar Sep 13 '21 12:09 axel-h

Ok, you be happy to be assigned to this and the PR being kept on draft until it is done?

lsf37 avatar Sep 14 '21 00:09 lsf37

@lsf37: yes.

axel-h avatar Sep 14 '21 00:09 axel-h

I've rebased this at https://github.com/axel-h/seL4/tree/kernel-debug-log

axel-h avatar Nov 26 '21 10:11 axel-h