[FEATURE][ChromeOS] Expose DSP load tracker info for the visibility to ChromeOS
Is your feature request related to a problem? Please describe. Currently we don't have much information for DSP in system logs attached in feedback report of audio issue. To debug the issue we need to re-build kernel with trace feature on and capture logs by sof-logger. It will be super helpful for debugging if DSP load tracker info could be obtained without those debugging steps. Moreover, we can setup an autotest item to track the DSP load performance for all supported products.
Describe the solution you'd like I don't have ideas which should be the appropriate/simple way to do this. It can be either actively reported from DSP, or on demands of the host request. For the active approach, it should be enough by reporting critical info only (e.g. DSP load is over the threshold) in order not to cause more burdens.
Describe alternatives you've considered None
Additional context None
@plbossart @ujfalusi @ranj063 any thoughts ?
When this is implemented, feature needs to be documented to https://thesofproject.github.io/latest/developer_guides/debugability/perf-counters/index.html?highlight=performance
I'd say an IPC to on-demand ask this information from FW would be the most suitable solution. So we could e.g. have a debugfs interface (exposed by SOF linux driver) and when CRAS queries this (at the rate it chooses), SOF drivers sends an IPC and FW replies with current DSP load tracker data.
@kv2019i we have mtrace and Zephyr oops now, anything else pending ?
there is one line log may cover this topic:
[ 65.782163]
simply check overruns can know whether dsp overloaded.
@lgirdwood @btian1 This is not done. The request is to have this data in user-space without enabling the FW logs (disabled in Chromebooks by default). So this requires new DSP+kernel infra. In IPC4, we have messages to report load measurements, but these are not implemented yet (on FW nor on kernel side). So this is still a valid ask and I'm not aware of any ongoing work to implement these (we discussed about the IPC4 messages in question with the cycles-budget (CPC) topic around manifest).
The request is to have this data in user-space without enabling the FW logs (disabled in Chromebooks by default). So this requires new DSP+kernel infra.
Why is this not just a log level configuration problem? https://docs.zephyrproject.org/latest/services/logging/index.html
Logging API is highly configurable at compile time as well as at run time. [...] During compilation logs can be filtered out on module basis and severity level. [...] Logs can also be compiled in but filtered on run time using dedicate API. Run time filtering is independent for each backend and each source of log messages. Source of log messages can be a module or specific instance of the module.
The old sof-logger has similar capabilities.
So the idea to start another, new and entirely different logging framework sounds weird to me. @johnylin76, @andyross, @cujomalainey why are Chromebooks "throwing the baby with the bathwater" and disabling logs entirely as opposed to selectively keep the logging they need/want?
@marc-hb at least on pre-ace systems I have been warned that enabling logging hijacks a DMA (which PNP is not happy about when it comes to power) and also it can block power states. Trust me I would love to have even just ERR/WARN on.
The old sof-logger uses either shared memory for high levels or DMA for all levels or both at the same time. It's configurable.
I'm not familiar with all Zephyr logging possibilities yet but I know for sure that the way it's configured does NOT use any DMA right now in our CI, it relies on shared memory "mtrace" only (which means it can overrun and lose statements; "you get what you paid for")
@cujomalainey @marc-hb That's a good question. "mtrace" Zephyr backend is friendlier to system flows (it doesn't take a DMA channel). but it's still not zero cost. There are writes to the SRAM that take DSP cycles, there's a logging thread that takes some memory and there's impact to code size (although we already have dictionary formats in Zephyr up and running limiting this). But in the end it will depend on the specific product. Our current assumption is still that at least not all release firmwares will have logging enabled (like has been the case with SOF previously). I.e. key debugging tools (like ability to extract a FW coredump), will be available even if you don't have logging enabled. Having a way to access DSP load information without logging, seems like a reasonable feature, similar to the coredump. I think we are running out of runway for v2.6 though, so this probably needs to be pushed out to v2.7.
There are writes to the SRAM that take DSP cycles, there's a logging thread that takes some memory and there's impact to code size (although we already have dictionary formats in Zephyr up and running limiting this).
Logging never comes for free; OK. Something Heisenberg something?
But how would a new technique focused on logging the DSP load only avoid all these costs? By being uber-specialized and optimized for reporting DSP load only? Then someone else realizes that "oh, wait, we also need the ability to log this other critical thing too in production builds... and that other one..."
Maybe I'm missing something but until fine-tuning the configuration of the existing logging framework has not been attempted and measured, starting a new brand new technique seems to fall in the "premature optimization" category.
I.e. key debugging tools (like ability to extract a FW coredump), will be available even if you don't have logging enabled
If that's already available without logging then it's different; it does really come "for free". Coredumps feel quite different from regular logging too; for instance you don't need a long-running thread streaming a series of coredumps :-)
There are writes to the SRAM that take DSP cycles, there's a logging thread that takes some memory and there's impact to code size
Will happily pay that price if it means debugging time is cut because we are not having blind errors in the field
Our current assumption is still that at least not all release firmwares will have logging enabled (like has been the case with SOF previously)
i think the bigger issue was that we didn't enable it in the current which meant the resources were never allocated. So even though the firmware was ready the kernel was enabled.
I think the load data could even be host triggered for our purposes into the log, but we need to be able to run the log in production without PNP getting annoyed.
@kv2019i , agree with your comment on running out of runway for v2.6. Moving to v2.7.
@kv2019i can we get the overall LL load per core in SRAM window ? @ujfalusi will probably need a kernel debugFS to read it.
SOF drivers sends an IPC and FW replies with current DSP load tracker data.
@ujfalusi will probably need a kernel debugFS to read it.
These seem to imply a "poll" style of user interface where a process must run and regularly poll the firmware.
The description and some other comments seem to lean more towards a "push" style of interface where the firmware regularly logs the data somehow and somewhere (a different question). In the latter case, the interval is controlled by the firmware so it is more predictable.
@cujomalainey (who ~filed this issue~ has been requesting this feature) do you have a preference?
SOF drivers sends an IPC and FW replies with current DSP load tracker data.
@ujfalusi will probably need a kernel debugFS to read it.
These seem to imply a "poll" style of user interface where a process must run and regularly poll the firmware.
The description and some other comments seem to lean more towards a "push" style of interface where the firmware regularly logs the data somehow and somewhere (a different question). In the latter case, the interval is controlled by the firmware so it is more predictable.
SRAM is FW push. DebugFS can be blocking read() for an update every 1sec using existing FW MCPS infra.
No preference, @johnylin76 thoughts?
The DebugFS approach @lgirdwood mentioned above looks good to me. I suppose that the feature doesn't affect DSP idle state, which loads are calculated and reported only when DSP core is active. Is that correct?
The DebugFS approach @lgirdwood mentioned above looks good to me. I suppose that the feature doesn't affect DSP idle state, which loads are calculated and reported only when DSP core is active. Is that correct?
Correct, this data is tracked already today within FW. Reporting it every 1 second via debugFS should not be too dificult.
Long term it would be great to have a reporting mechanism where we could see all threads and system loads per core, number of IRQs etc, but this is something that Zephyr will need to export. @andyross maybe something you can look at long term ?
We have had lot of focus on the per-module measurements infra in v2.8 cycle, so this has not progressed. The ask is still valid, so let's move it to v2.9 and see if we can finally do the final bit of publishing this to the host side.
We have had lot of focus on the per-module measurements infra in v2.8 cycle, so this has not progressed. The ask is still valid, so let's move it to v2.9 and see if we can finally do the final bit of publishing this to the host side.
Lets target LL 1st then DP 2nd. I think we have all the data collected now in FW for LL as CI is able to parse the logs per module.
@jsarha for v2.9 we need something initially that can provide data in an unused RO SRAM Window (for Intel - other Vendors will choose their memory locations) that can be read by debugFS as binary data with userspace parsing i.e. a Python script would read and parse the debugFS and display the contents. Later on it could use other transports like probes.
-
The SRAM Windows and FW data collection will be disabled by default - it will be enabled at runtime via a new IPC message. i.e. the IPC to enable can be sent by kernel when debugFS is open() and IPC to disable when debugFS is close().
-
There will be a second debugFS that allows for the FW data collection to be tuned to the needs of the developer. Each data payload will have an ID (iniut32_t) and if host echos the ID to debug FS then that sends an IPC (set large config) to enable that feature i.e.
echo 123, 789, -456 > /sys/debug/blah/sof_blah
Will enable data items with ID 123, 789 and disable ID 456 (negative).
- The FW data wont be parsed by the kernel (except for the total size & magic). It will just be passed through. It should have a generic header that is easy to parse e.g.
struct payload_hdr {
uint32_t total_size; // total size of the whole payload in bytes
uint32_t hdr_size; // size of this header only in bytes
uint32_t magic; // used to identify valid data
uint32_t abi; // ABI version, can be used by tool to warn users if too old.
uint64_t tstamp; // aligned with host epoch.
};
Following this will be the data structures/containers. They should all lead with a standard header that the tool can use e.g.
struct elem_hdr {
uint32_t size; // size of this elem in bytes;
uint32_t id; // ID of this elem type
};
/* elem body follows here */
This way the Python tool can read and parse, it should ignore any IDs it does not know about.
Stable-v2.9 branched, this didn't make the cut, bumping to 2.10. There is progress with both Zephyr thread analyzer and with a more generic telemetry interface, but neither is ready for merge for SOF2.9.
@jsarha can you give an update of any PRs and output to date. Thanks !
I have some couple of PRs to address this issue. First there is Intel ADSP fixes for Zephyr Thread Analyzer:
- https://github.com/zephyrproject-rtos/zephyr/pull/71408
And then couple two other PRs, depending on the first one, that still need some polishing, but which are functional and working at the moment:
The telemetry2 and thread info FW code, and the python client to decode the associated debugfs.
- https://github.com/thesofproject/sof/pull/9084
Simple debugfs code to Linux sof driver to map the telemetry2 window to a debugfs file.
- https://github.com/thesofproject/linux/pull/4956
Here is example output of the current implementation:
python3 thread-analyzer.py
Open /sys/kernel/debug/sof/telemetry2 failed '[Errno 14] Bad address'
core_count: 3
Core 0 load 0%
0x401184a8 cpu 0.0% stack 2.0%
0x401183e0 cpu 0.0% stack 2.0%
1 thread info cpu 0.0% stack 8.2%
0 thread info cpu 0.0% stack 47.1%
edf_workq cpu 0.0% stack 8.2%
sysworkq cpu 0.0% stack 31.8%
logging cpu 0.0% stack 19.6%
idle 00 cpu 99.6% stack 33.7%
Core 2 load 0%
idle 02 cpu 99.6% stack 27.1%
0x40118570 cpu 0.0% stack 47.1%
2 thread info cpu 0.0% stack 47.1%
Core 0 load 0%
0x401184a8 cpu 0.0% stack 2.0%
0x401183e0 cpu 0.0% stack 2.0%
1 thread info cpu 0.0% stack 8.2%
0 thread info cpu 0.0% stack 47.1%
edf_workq cpu 0.0% stack 8.2%
sysworkq cpu 0.0% stack 31.8%
logging cpu 0.0% stack 19.6%
idle 00 cpu 99.6% stack 33.7%
Core 2 load 6%
ll_thread2 cpu 0.0% stack 18.8%
idle 02 cpu 93.3% stack 28.6%
0x40118570 cpu 0.4% stack 47.1%
2 thread info cpu 0.0% stack 47.1%
Core 0 load 0%
0x401184a8 cpu 0.0% stack 2.0%
0x401183e0 cpu 0.0% stack 2.0%
1 thread info cpu 0.0% stack 8.2%
0 thread info cpu 0.0% stack 47.1%
edf_workq cpu 0.0% stack 8.2%
sysworkq cpu 0.0% stack 31.8%
logging cpu 0.0% stack 19.6%
idle 00 cpu 99.6% stack 33.7%
Core 2 load 5%
ll_thread2 cpu 5.5% stack 18.8%
idle 02 cpu 94.1% stack 28.6%
0x40118570 cpu 0.0% stack 47.1%
2 thread info cpu 0.0% stack 47.1%
Core 0 load 0%
0x401184a8 cpu 0.0% stack 2.0%
0x401183e0 cpu 0.0% stack 2.0%
1 thread info cpu 0.0% stack 8.2%
0 thread info cpu 0.0% stack 47.1%
edf_workq cpu 0.0% stack 8.2%
sysworkq cpu 0.0% stack 31.8%
logging cpu 0.0% stack 19.6%
idle 00 cpu 99.6% stack 33.7%
Core 2 load 5%
ll_thread2 cpu 5.5% stack 18.8%
idle 02 cpu 94.1% stack 28.6%
0x40118570 cpu 0.0% stack 47.1%
2 thread info cpu 0.0% stack 47.1%
Core 0 load 0%
0x401184a8 cpu 0.0% stack 2.0%
0x401183e0 cpu 0.0% stack 2.0%
1 thread info cpu 0.0% stack 8.2%
0 thread info cpu 0.0% stack 47.1%
edf_workq cpu 0.0% stack 8.2%
sysworkq cpu 0.0% stack 31.8%
logging cpu 0.0% stack 19.6%
idle 00 cpu 99.6% stack 33.7%
Core 2 load 5%
ll_thread2 cpu 5.5% stack 18.8%
idle 02 cpu 94.1% stack 28.6%
0x40118570 cpu 0.0% stack 47.1%
2 thread info cpu 0.0% stack 47.1%
Core 0 load 1%
0x401184a8 cpu 0.0% stack 2.0%
0x401183e0 cpu 0.0% stack 2.0%
1 thread info cpu 0.0% stack 8.2%
0 thread info cpu 0.0% stack 47.1%
edf_workq cpu 0.0% stack 8.2%
sysworkq cpu 0.0% stack 31.8%
logging cpu 0.8% stack 19.6%
idle 00 cpu 98.4% stack 33.7%
- The SRAM Windows and FW data collection will be disabled by default - it will be enabled at runtime via a new IPC message. i.e. the IPC to enable can be sent by kernel when debugFS is open() and IPC to disable when debugFS is close().
@lgirdwood What if you want this FW data collection to start immediately without waiting for user-space to enable this with an IPC?
Take the example of the trace, it's on immediately. userspace can select to read it or not.
- The SRAM Windows and FW data collection will be disabled by default - it will be enabled at runtime via a new IPC message. i.e. the IPC to enable can be sent by kernel when debugFS is open() and IPC to disable when debugFS is close().
@lgirdwood What if you want this FW data collection to start immediately without waiting for user-space to enable this with an IPC?
Take the example of the trace, it's on immediately. userspace can select to read it or not.
Kconfig initially, but there could be later updates that allow developers to switch on/of via other methods.
@lgirdwood if it's a Kconfig, then why would you need an IPC to turn the data collection on?
I don't know about this case specifically but you can have a default value in Kconfig for boot time and the ability to change it again at runtime, it's not unheard of.
(you can also have the Kconfig ability to strip the code and save space but that's a different Kconfig)