[BUG] Error handling on memory allocation failure
Describe the bug
when the firmware fails on a memory allocation, the error code is not provided to the host driver. Instead we see a useless "Unsupported operation requested".
See Intel daily tests #/result/planresultdetail/39211?model=LNLM_RVP_NOCODEC&testcase=check-signal-stop-start-capture-50
[ 3994.093620] <err> buffer: buffer_alloc: buffer_alloc(): could not alloc size = 1536 bytes of type = 32
[ 3994.093625] <err> dai_comp: dai_set_dma_buffer: comp:0 0x4 dai_set_dma_buffer(): failed to alloc dma buffer
[ 3994.093636] <err> dai_comp: dai_common_params: comp:0 0x4 dai_zephyr_params(): alloc dma buffer failed.
[ 3994.093641] <err> module_adapter: module_prepare: comp:0 0x4 module_prepare() error -12: module specific prepare failed, comp_id 4
[ 3994.093646] <err> module_adapter: module_adapter_prepare: comp:0 0x4 module_adapter_prepare() error fffffff4: module prepare failed
[ 3994.093650] <err> pipe: pipeline_prepare: pipe:0 0x0 pipeline_prepare(): ret = -12, dev->comp.id = 4
[ 3994.093655] <err> ipc: ipc4_pcm_params: ipc: pipe 0 comp 0 prepare failed -12
[ 3994.093663] <inf> component: comp_set_state: comp:0 0x4 comp_set_state(), state already set to 1
[ 3994.093668] <err> ipc: ipc_cmd: ipc4: FW_GEN_MSG failed with err 7
[ 3994.093627] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x13000004|0x0: GLB_SET_PIPELINE_STATE
[ 3994.094083] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x33000007|0x0: GLB_SET_PIPELINE_STATE
[ 3994.094092] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: FW reported error: 7 - Unsupported operation requested
[ 3994.094128] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc error for msg 0x13000004|0x0
[ 3994.094131] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ASoC: error at soc_dai_trigger on SSP1 Pin: -22
[ 3994.094135] kernel: Port1: ASoC: trigger FE cmd: 1 failed: -22
Expected behavior A clear and concise error message instead of a useless log
Btw this case of alloc failing is tracked in separate bug https://github.com/thesofproject/sof/issues/8966
Let's use this to track how the error is reported to host.
Added for v2.10, could be one of the cleanups.
this is one of the effects of two co-existing error code families in the firmware: POSIX and IPC4
@lgirdwood @plbossart I don't understand why this is a production blocker. I agree it's a confusing error message, but an error is still reported.
presumably in a production firmware, the memory allocation failure doesn't happen? I think this is more for developers and integration phases, it's just very painful to make progress.
@plbossart Ack. I was perfectly fine with the v2.10 tag as this was/is slowing down development&debug. But now that the v2.10 window has closed, it's a different question whether this is something that will block the quarterly release, and my view is that it is not. UPDATE: and to confirm, the actual alloc failure that prompted this bug, has been fixed https://github.com/thesofproject/sof/issues/8966 -- so indeed in production releases, the alloc should never fail and any known such case should of course be fixed before any release.
No comment/responses, but as we are beyond rc1 and there's no case presented why this should block 2.10, moving to v2.11.
FYI @lyakh , pushing to v2.12.
@lyakh any update? I set a priority for this now and moved to v2.13 (as 2.12 has branched).
this is one of the effects of two co-existing error code families in the firmware: POSIX and IPC4
To be more specific this specific error code misreporting is happening here https://github.com/thesofproject/sof/blob/1d1b1dd75cc1c2e9480a061a9bb4d3b247a1e8b7/src/ipc/ipc4/handler.c#L343-L345 and to fix that a proper mapping between POSIX and IPC4 error codes should be implemented
Nobody assigned, moving to v2.14
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.