linux icon indicating copy to clipboard operation
linux copied to clipboard

JSL:BUG: DSP boot failed after system resume due to memory alloc failed : -12

Open Vamshigopal opened this issue 1 year ago • 11 comments

Describe the bug On JSL chromebook device , When system goes to low memory , we see memory alloc failed and DSP failed to boot after system resume.

To Reproduce

Boot the chromebook Restrict the system memory to 4gb Run memory intense workloads Paralley run suspend_stress_test

Redroduce rate

Very rare , but frequent reports from field.

Environment Kernel Branch: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/tags/v6.1.105 Platform: JSL

Logs: dmesg (6).txt

Screenshots or console output:

[ 136.436831] sof-audio-pci-intel-icl 0000:00:1f.3: error: memory alloc failed: -12 [ 136.436865] sof-audio-pci-intel-icl 0000:00:1f.3: error: dma prepare for fw loading failed [ 136.436869] sof-audio-pci-intel-icl 0000:00:1f.3: ------------[ DSP dump start ]------------ [ 136.436871] sof-audio-pci-intel-icl 0000:00:1f.3: Failed to start DSP [ 136.436874] sof-audio-pci-intel-icl 0000:00:1f.3: fw_state: SOF_FW_BOOT_IN_PROGRESS (3) [ 136.436897] sof-audio-pci-intel-icl 0000:00:1f.3: 0xffffffff: unknown ROM status value [ 136.436926] sof-audio-pci-intel-icl 0000:00:1f.3: extended rom status: 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff 0xffffffff [ 136.436929] sof-audio-pci-intel-icl 0000:00:1f.3: ------------[ DSP dump end ]------------ [ 136.436932] sof-audio-pci-intel-icl 0000:00:1f.3: error: failed to boot DSP firmware after resume -12 [ 136.436937] sof-audio-pci-intel-icl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_pm_runtime_get on 0000:00:1f.3: -12 [ 136.436942] SSP1-Codec: ASoC: error at __soc_pcm_open on SSP1-Codec: -12 [ 136.436946] Speakers: ASoC: error at dpcm_be_dai_startup on Speakers: -12 [ 136.436950] Speakers: ASoC: error at dpcm_fe_dai_startup on Speakers: -12 [ 137.461411] sof-audio-pci-intel-icl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_pm_runtime_get on 0000:00:1f.3: -22

Vamshigopal avatar Aug 30 '24 06:08 Vamshigopal

@plbossart @ujfalusi @kv2019i

Vamshigopal avatar Aug 30 '24 06:08 Vamshigopal

I am not sure what you would expect from the audio driver @Vamshigopal. The system has a problem or some sort of memory leak. There's not much we can do here.

IIRC JSL does not rely on the IMR boot, maybe that's part of the problem. The firmware would be too old to make use of enhanced capabilities.

plbossart avatar Aug 30 '24 07:08 plbossart

Yes @plbossart i can understand that there not much we can do if there is no memory left , but here if DSP boot fails and when system has as sufficient memory, we cant recover DSP without rebooting the whole system. This experience to user seems problematic. Espically in the sytems with 4gb RAM this issue occurs more frequently.

What Customer is looking from audio driver is is there a way we can upgrade the flags passed into the allocation such that it never fails ? I can understand if we try too hard we get watchdog and giving up is breaking audio.

Yes IMR is supported from CAVS2_5-001-drop-stable branch , so we cant support it for JSL platform.

Vamshigopal avatar Aug 30 '24 08:08 Vamshigopal

This is 6.1 kernel, right? I think there were some fixes, improvements to make this allocation failure more rare. It is also possible that some backported patch broke the allocation logic (afaik we had that in the past).

If we cannot allocate memory for the firmware to download then it is least of the problem in the system, which will fail in all sorts of way humanly possible.

ujfalusi avatar Aug 30 '24 08:08 ujfalusi

@Vamshigopal I think this is same issue as we had with ADL-N -> https://github.com/thesofproject/linux/issues/3915 and https://github.com/thesofproject/linux/issues/3844

For latter, we submitted

commit a61c7d88d38cf3b9c88cf667c4f8a389a57744d4
Author: Kai Vehmanen <[email protected]>
Date:   Fri Sep 23 18:35:01 2022 +0300

    ALSA: memalloc: use __GFP_RETRY_MAYFAIL for DMA mem allocs

to fix the case. There is a risk the issue can come back as the solution done to try harder in audio driver, seems to trigger issues in other cases (in low-memory conditions).

kv2019i avatar Aug 30 '24 08:08 kv2019i

This is 6.1 kernel, right? I think there were some fixes, improvements to make this allocation failure more rare. It is also possible that some backported patch broke the allocation logic (afaik we had that in the past).

yes 6.1 chrome kernel (https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/tags/v6.1.105). If you know any fixes, please do share it i can check if its part of the kernel.

Vamshigopal avatar Aug 30 '24 08:08 Vamshigopal

@Vamshigopal, do you have timeline when the reports started to com in or this has always been there since launch?

ujfalusi avatar Aug 30 '24 08:08 ujfalusi

@Vamshigopal, do you have timeline when the reports started to com in or this has always been there since launch?

As per customer , they started seeing this issue once they migrated the kernel from v5.4 to v6.1. v5.4 https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-5.4 v6.1 https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-6.1

In v5.4 kernel they didnt see this issue, only in v6.1 kernel they saw this issue.

Vamshigopal avatar Aug 30 '24 09:08 Vamshigopal

@Vamshigopal I think this is same issue as we had with ADL-N -> #3915 and #3844 Yes @kv2019i we have similar issues in ADL-N and ADL , but one difference i would see here in JSL we dont have IMR.

Vamshigopal avatar Aug 30 '24 09:08 Vamshigopal

Ack, @Vamshigopal not using IMR makes this condition easier to hit. https://github.com/tiwai/sound/commit/a61c7d88d38cf3b9c88cf667c4f8a389a57744d4 is worth a try on top of v6.1.

kv2019i avatar Aug 30 '24 09:08 kv2019i

@Vamshigopal, can we close this issue? The patches are in upstream for quite long time now and they should be fixing the problem once for all (famous last words).

ujfalusi avatar Jan 29 '25 15:01 ujfalusi

As we have now robust ways to overcome this issue in multiple levels, I will close the issue.

ujfalusi avatar Feb 28 '25 12:02 ujfalusi