Using defmt to trace function execution
https://user-images.githubusercontent.com/853158/123317587-cd36e380-d4f3-11eb-938c-1e478464a0fb.mp4
I've hacked together function tracing for a private project which works surprisingly well. I think it's something that we could support with a reasonable amount of effort. I'll describe what I've done and comment on changes that could help polish this up.
Background
I have an STM32H7 running RTIC firmware which has different tasks at different interrupt priority levels. Lots of free RAM (512KB AXI SRAM). Lots of idle CPU (~50%). Can't block or disable interrupts for sync as I have to respond to an interrupt very consistently within microseconds.
I believe this could be made to work for other systems with different requirements. If there's little free RAM the tracing could be triggered at runtime and run until the buffer is full. Systems with less stringent latency requirements could submit smaller buffers more frequently. Selectively enabling tracing (https://github.com/knurling-rs/defmt/pull/519) would help reduce bandwidth requirements. Lower priority messages could be dropped to create space for newer higher priority messages.
Logger
I have a framed BBQueue per thread priority so logging never blocks. If the queue is full we skip the message and increment a counter for warning later. In the idle subroutine I copy frames from the thread queues to a large buffer, encode with rzCOBS, and transmit over a UART with DMA at 2Mbps. The Logger waits for the host to transmit a character before it begins sending the logs to avoid losing data.
Wire Format 2.0 (https://github.com/knurling-rs/defmt/issues/492) is already implementing rzCOBS framing. We could add a generic BBQueue logger implementation (https://github.com/knurling-rs/defmt/issues/460#issuecomment-827116010) to help support various transports. It would be helpful to know ahead of time how much space is required for a log message but that is probably difficult in general.
Function Tracing
I'm using the trace crate attribute macro to easily generate function enter/exit messages. I've modified it lightly to not use thread local storage (no indentation) and to use defmt. I manually add TASK: to RTIC task function names to help understand task switching.
Theoretically we could do sample-based profiling as well by periodically recording the program counter, but I haven't implemented this.
Host Recording
I've modified probe-run and defmt-print to support serial port reading and rzCOBS decoding. You can enable an output log file and can use a log level filter for the console output separate from the log file.
I think it would be nice for probe-run to support serial ports, but perhaps the data could be piped into defmt-print. It's nice having it all integrated, though, especially since probe-run can tell the target when its ready for log data.
Profile Viewing
I wrote a small program that parses the log file to create a Firefox Profiler (https://profiler.firefox.com/) compatible output. I initially targeted Trace Compass (https://www.eclipse.org/tracecompass/), but it was trickier to integrate since it's designed for the complex Linux tracing ecosystem. The Firefox profiler is obviously designed for multi-thread multi-process C and Javascript profiles but it's flexible enough to work pretty well for this use case. The program sorts the log lines (they may have been reordered by the priority queues) and uses regexes to parse function trace messages. It keeps track of call stacks and task switching to output profiler samples.
Having custom message tags (https://github.com/knurling-rs/defmt/issues/182) for function call/return and variable value watches would be less of a hack than parsing log messages as text. The custom messages could possibly use a purpose-built efficient encoding, but defmt is already pretty good at sending interned strings.
defmt already works pretty well for this use case, and many of the improvements that are being worked on will help as well. We can at least keep it in mind as we think about changes for the future. I'll see how much of this I can polish up and open source.
@mattico Three years later and the ecosystem has shifted slightly and I wanted to know if you had an updated approach to this because I like the idea. With the deprecation of probe-run and specifically the removal of the --json flag, I believe this approach has to change slightly. I was curious if any of your changes ended up being developed, or if it ended with this documentation.
I know your solution was targeted at UART, so you may not have any insight here, but is there a process for recovering target-generated defmt-rtt messages from probe-rs? I don't seem to notice the messages when using --log-file in the newer probe-rs run option. I suspect this would require modifications to probe-rs much like you mentioned, but if that data could be recovered a RTT and UART solution could both leverage the same solution beyond that.
I would to know if any of these ideas have already seeped in somewhere over the years and this problem is mostly solved, that would be great!
@Knight-Ops In the time since I wrote this I have changed a number of things.
- I shifted to using RTT vs UART for defmt. I was originally using a separate RTT stream per interrupt priority level, with a lightly modified
probe-runto read from all the RTT streams. IIRC the enabler here was using an hs-probe, which is the only thing I could find which could reliably read RTT at high speed. - Later I ported the firmware from RTIC 1.0 to Embassy, using two async executors at only 2 priority levels, and eventually migrated to just a single standard RTT stream for messages.
- I haven't seen much value from function-level tracing, so my code has atrophied. It was more annoying dealing with the trace macros and custom probe-run fork than the extra value I gained over just doing ad-hoc printf-style profiling.
I still believe this could be a good value-add for the ecosystem with a moderate amount of effort. Nowadays with the rise of OTel and the tracing ecosystem, I wonder if compatibility with some of that could be leveraged? Perhaps a tracing_subscriber_defmt? I bet there would be issues with no_std and binary size as usual.
This actually sounds almost exactly what I was looking to implement. This may be conversation better had in an issue on Embassy, but I don't think the rtos_trace functionality really provides enough insight into the operation of Embassy, which is the problem I am trying to solve (profiling tasks across cores specifically, as well as profiling the executor). I have been utilizing the Pico debug probe currently for my RTT reading, so I am not sure on the speed of that, but have never seemingly run into issues, but also haven't been reading the raw streams. Sadly the hs-probe isn't easily available in the store anymore or Tindie stores I could find, and seeing as how this is a side-quest of a side-quest, I will just hope the pico debug works for the time being.
I appreciate the insights after such a long time. I have some work to do getting more familiar with the RTT stream extraction using my current setup, but I do like the idea of possibly using a tracing subscriber in this situation for more unification. Specifically point 1 is the place where I am still working and need to do some more research as it relates to defmt. I think essentially I just need to get familiar enough with probe-rs to figure out where to hook in to extract the logs I want for RTT. If you have any up to date resources you have run across for that, feel free to send them my way. Looks like the defmt book still uses probe-run.
Yeah the hs-probe was a one-time side project of a few probe-rs contributors. I'm surprised there aren't other options, I looked for a long time but maybe I missed something. I think 95%+ of users don't actually need a USB 2.0 speed debugger, which is probably the reason few exist. I have a 480MHz microcontroller and had many many function calls instrumented, and I also needed the RTT reading to take a minimal amount of time. IIRC when the debugger is reading the RTT it basically pauses the core execution, and my device needed to have reliable low latency response to an interrupt. tl;dr: I think it's likely that you don't need a super fast probe, even though I did.
I think essentially I just need to get familiar enough with probe-rs to figure out where to hook in to extract the logs I want for RTT. If you have any up to date resources you have run across for that, feel free to send them my way.
So I have switched to probe-rs run semi-recently, and I can't remember if I have used that with the tracing stuff before, but I don't see where the difficulty would be. If you could elaborate on what you're having trouble with maybe I could help further. Basically in my .cargo/config I have my runner set to probe-rs run e.g.
[target.'cfg(all(target_arch = "arm", target_os = "none"))']
runner = "probe-rs run --chip STM32H743XIHx --probe 1209:4853:0D001C001650563641333620 --speed 24000 --connect-under-reset"
and then I cargo run, and it automatically builds, flashes, and runs. probe-rs should automatically detect the RTT stream and automatically detect and decode the defmt data. IIRC in the past those had to be manually configured, but I don't have any embed.toml files in my project anymore. I think you can just add a --log-file argument, to have the RTT logs saved to a file. Then write a little utility which uses regexp or whatever to parse the log file and produce a firefox profile json file. Mine was developed for a proprietary customer so I'm not at liberty to share it, but it isn't particularly tricky. If you can work with a noncommercial license, it looks like this code from functiontrace will get you 90% of the way there. Otherwise it isn't too hard to understand the required JSON types from the firefox documentation/code and a few example profiles.
FWIW I've also put together a few related things for a project at work.
I made a proof-of-concept data collector (we call them reflector plugins) for a customer that will ingest defmt logs (and optionally RTIC trace instrumentation) over RTT using probe-rs:
https://github.com/auxoncorp/modality-defmt-plugins
It leans on an RTIC fork I made that adds defmt trace points around the various RTIC facilities (customer was ok with using a modified version of RTIC).
It was sufficient for our customer use case at the time, so I haven't pushed on further trace enablement in RTIC or elsewhere, but might be a useful reference for someone.
There's a test_system directory with an example that runs in renode if anyone wants to poke around.
In case it's helpful, here's another couple of personal tools I've slapped together for reading RTT data and defmt logs (both using probe-rs):
- https://github.com/jonlamb-gh/rtt-reader
- https://github.com/jonlamb-gh/defmt-rtt-printer The former has been handy for other non-rust systems I deal with at work (FreeRTOS with TraceRecorder tracing).
Hi there, I am very excited about this conversation. Indeed, I was about to write a small tracing crate for embassy before I read you all. My plan is very not different from yours, except I would like to be compatible with the perfetto trace format for using the same tool for Linux and Embassy apps.
@mattico IIUC, you cannot share your work on defmt/embassy/RTT streams, right?
Last but not least, I would like to make a fully open-source library for this as I think that it could really helps the community.
@inthehack Interesting! Perfetto looks like a good fit, I'm not sure why I didn't use that. Perhaps it wasn't as mature back then.
No, I can't share the code I did for this. I didn't really implement it in a generalizable way, more of a collection of hacks specific to my (proprietary) firmware. I don't think it'd be a very useful starting point anyway, as a bunch of forks of years old libraries.
Good luck!
@mattico Ok no worries, I understand. Thank you for your reply anyway 🙏🏼