audio: buffer: fix crash with IPC4 build and DBG log level
A null dereference is hit when comp_update_buffer_produce() or comp_update_buffer_consume() is called with either a NULL buffer->sink or buffer->source. This condition happens regularly in IPC4 builds.
Issue is limited to SOF builds with logging at DEBUG level.
Signed-off-by: Kai Vehmanen [email protected]
With this PR, I can enable DBG level logs and get the logs via mtrace. There are some dropped messages (this is really stretching the capacity of the SRAM log mechanism), but e.g. sof-test cases still pass, so this maybe a useful option for some debug cases.
I once heard that topologies offer the most attack surface, feels correct...
@lyakh wrote:
we should actually find out why these pointers can be NULL. This is called during the
.copy()execution, from the pipeline task, right?
Sure thing, but it is very unhelpful if you cannot enable DBG logs at all. With this PR, you can at least enable the logs and spot this problem by observing invalid value in DBG log output. Without this PR, you will have a DSP fail and with our current infra, no relevant logs to look at.
I need to fix these though: ""/zep_workspace/sof/src/audio/buffer.c:253:33: error: format '%u' expects argument of type 'unsigned int', but argument 2 has type 'long unsigned int' [-Werror=format=]""
https://github.com/thesofproject/sof/actions/runs/3191120342/jobs/5207050962
With this PR, you can at least enable the logs and spot this problem by observing invalid value in DBG log output. Without this PR, you will have a DSP fail and with our current infra, no relevant logs to look at.
I suspect @lyakh is saying that DBG is good but not enough. Should there be an ERR somewhere too?
@marc-hb wrote:
I suspect @lyakh is saying that DBG is good but not enough. Should there be an ERR somewhere too?
Somebody who understands the context, please do a PR. I followed @lyakh 's link, but the policy is not at all obvious following that discussion. If this is something that is an error, that probably should be caught at stream setup time, rather than have conditional code always built in in a hot-path like this function.
rather than have conditional code always built in in a hot-path like this function.
Oh yes I didn't mean that, sorry for the confusion. Should likely be done elsewhere (I have no idea where).
However I can imagine how "downgrading" a crash into a mere DEBUG statement can also hide the bigger problem and downgrade a "MUST FIX and catch bad topologies" into a "will fix later -> never".
OK I'll stop trying to guess what @lyakh thinks, I guess this can wait until he's back after the week-end. It just felt like a typical and valid concern.
@kv2019i pls do fix through the print formatters.
@marc-hb wrote:
However I can imagine how "downgrading" a crash into a mere DEBUG statement can also hide the bigger problem and downgrade a "MUST FIX and catch bad topologies" into a "will fix later -> never".
If we know that for sure (a NULL sink or source means a bad topology) then yes, that would be more correct. To me, this is much less clear at this point, whether this is a legal condition or not. And I still claim this should be caught in much earlier when said topology is used to create DSP side objects.
Plus a more temporary POV: if the condition is only hit occasional (some aspect to timing involved), it's currently actually better to emit a log message than crash the DSP, given we still have some limitations to capture full crash output. This is probably the main reason why DBG log level has been unavailable this long as debugging why it doesn't work, has not been at all trivial.
V2: updated to fix the compiler warning.
V3 update:
- moved (unsigned int) cast as only UINT32_MAX may be unsigned long in some builds