sof icon indicating copy to clipboard operation
sof copied to clipboard

[BUG] Error handling on memory allocation failure

Open plbossart opened this issue 1 year ago • 8 comments

Describe the bug

when the firmware fails on a memory allocation, the error code is not provided to the host driver. Instead we see a useless "Unsupported operation requested".

See Intel daily tests #/result/planresultdetail/39211?model=LNLM_RVP_NOCODEC&testcase=check-signal-stop-start-capture-50

[ 3994.093620] <err> buffer: buffer_alloc: buffer_alloc(): could not alloc size = 1536 bytes of type = 32
[ 3994.093625] <err> dai_comp: dai_set_dma_buffer: comp:0 0x4 dai_set_dma_buffer(): failed to alloc dma buffer
[ 3994.093636] <err> dai_comp: dai_common_params: comp:0 0x4 dai_zephyr_params(): alloc dma buffer failed.
[ 3994.093641] <err> module_adapter: module_prepare: comp:0 0x4 module_prepare() error -12: module specific prepare failed, comp_id 4
[ 3994.093646] <err> module_adapter: module_adapter_prepare: comp:0 0x4 module_adapter_prepare() error fffffff4: module prepare failed
[ 3994.093650] <err> pipe: pipeline_prepare: pipe:0 0x0 pipeline_prepare(): ret = -12, dev->comp.id = 4
[ 3994.093655] <err> ipc: ipc4_pcm_params: ipc: pipe 0 comp 0 prepare failed -12
[ 3994.093663] <inf> component: comp_set_state: comp:0 0x4 comp_set_state(), state already set to 1
[ 3994.093668] <err> ipc: ipc_cmd: ipc4: FW_GEN_MSG failed with err 7
[ 3994.093627] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx      : 0x13000004|0x0: GLB_SET_PIPELINE_STATE
[ 3994.094083] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x33000007|0x0: GLB_SET_PIPELINE_STATE
[ 3994.094092] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: FW reported error: 7 - Unsupported operation requested
[ 3994.094128] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc error for msg 0x13000004|0x0
[ 3994.094131] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ASoC: error at soc_dai_trigger on SSP1 Pin: -22
[ 3994.094135] kernel:  Port1: ASoC: trigger FE cmd: 1 failed: -22

Expected behavior A clear and concise error message instead of a useless log

plbossart avatar Mar 27 '24 21:03 plbossart

Btw this case of alloc failing is tracked in separate bug https://github.com/thesofproject/sof/issues/8966

Let's use this to track how the error is reported to host.

kv2019i avatar Apr 02 '24 11:04 kv2019i

Added for v2.10, could be one of the cleanups.

lgirdwood avatar Apr 16 '24 12:04 lgirdwood

this is one of the effects of two co-existing error code families in the firmware: POSIX and IPC4

lyakh avatar May 28 '24 13:05 lyakh

@lgirdwood @plbossart I don't understand why this is a production blocker. I agree it's a confusing error message, but an error is still reported.

kv2019i avatar Jun 05 '24 08:06 kv2019i

presumably in a production firmware, the memory allocation failure doesn't happen? I think this is more for developers and integration phases, it's just very painful to make progress.

plbossart avatar Jun 05 '24 09:06 plbossart

@plbossart Ack. I was perfectly fine with the v2.10 tag as this was/is slowing down development&debug. But now that the v2.10 window has closed, it's a different question whether this is something that will block the quarterly release, and my view is that it is not. UPDATE: and to confirm, the actual alloc failure that prompted this bug, has been fixed https://github.com/thesofproject/sof/issues/8966 -- so indeed in production releases, the alloc should never fail and any known such case should of course be fixed before any release.

kv2019i avatar Jun 05 '24 10:06 kv2019i

No comment/responses, but as we are beyond rc1 and there's no case presented why this should block 2.10, moving to v2.11.

kv2019i avatar Jun 25 '24 09:06 kv2019i

FYI @lyakh , pushing to v2.12.

kv2019i avatar Sep 13 '24 07:09 kv2019i

@lyakh any update? I set a priority for this now and moved to v2.13 (as 2.12 has branched).

kv2019i avatar Jan 03 '25 11:01 kv2019i

this is one of the effects of two co-existing error code families in the firmware: POSIX and IPC4

To be more specific this specific error code misreporting is happening here https://github.com/thesofproject/sof/blob/1d1b1dd75cc1c2e9480a061a9bb4d3b247a1e8b7/src/ipc/ipc4/handler.c#L343-L345 and to fix that a proper mapping between POSIX and IPC4 error codes should be implemented

lyakh avatar Feb 20 '25 11:02 lyakh

Nobody assigned, moving to v2.14

kv2019i avatar Apr 23 '25 13:04 kv2019i

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] avatar Jul 24 '25 09:07 github-actions[bot]