linux icon indicating copy to clipboard operation
linux copied to clipboard

WHL: UpExtreme: iDisp3: ASoC: error at __soc_pcm_open on iDisp3: -16

Open plbossart opened this issue 1 year ago • 31 comments

Reported by internal tests, e.g.

#/result/planresultdetail/38998?model=WHL_UPEXT_HDA-ipc3&testcase=multiple-pipeline-playback-50

This is concerning since it's just a generic HDaudio device...No idea when it started to appear.

[ 2576.892664] kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60010000: GLB_STREAM_MSG: PCM_PARAMS
[ 2576.893024] kernel: snd_sof_intel_hda_common:hda_set_stream_data_offset: sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: stream dir 0, posn mailbox offset is 790528
[ 2576.893168] kernel: snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi3: -16
[ 2576.893191] kernel:  iDisp3: ASoC: error at __soc_pcm_open on iDisp3: -16
[ 2576.893204] kernel:  HDMI3: ASoC: error at dpcm_be_dai_startup on HDMI3: -16
[ 2576.893215] kernel:  HDMI3: ASoC: error at dpcm_fe_dai_startup on HDMI3: -16
Test Case URL: https://github.com/thesofproject/sof-test/tree/main/test-case/MODEL=WHL_UPEXT_HDA-ipc3
cmd: TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg MODEL=WHL_UPEXT_HDA-ipc3 SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pipeline.sh -f p -l 50
====================
framework load: ssh  [email protected]  'nohup bash -c "TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg MODEL=WHL_UPEXT_HDA-ipc3 SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pipeline.sh -f p -l 50 2>&1 | nc -N 10.54.29.142 50789" >/dev/null &'
====================
framework run: 'netcat -l -k 50789'
====================
2024-03-19 01:19:35 UTC [REMOTE_INFO] ktime=2576 sof-test PID=18830: starting
2024-03-19 01:19:35 UTC [REMOTE_INFO] Starting /usr/local/bin/sof-logger -t -l /etc/sof/sof-cnl.ldc > /home/ubuntu/sof-test/logs/multiple-pipeline/2024-03-19-01:19:35-20621/slogger.txt &
2024-03-19 01:19:35 UTC [REMOTE_INFO] /home/ubuntu/sof-test/test-case/multiple-pipeline.sh will use topology /usr/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg to run the test case
2024-03-19 01:19:35 UTC [REMOTE_INFO] Pipeline list to ignore is specified, will ignore 'pcm=HDA Digital' in test case
2024-03-19 01:19:35 UTC [REMOTE_INFO] Run command to get pipeline parameters
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] sof-tplgreader.py /usr/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg -f 'type:playback & ~pcm:Amplifier Reference' -b ' pcm:HDA Digital' -s 0 -e
2024-03-19 01:19:35 UTC [REMOTE_INFO] ===== Testing: (Loop: 1/50) =====
2024-03-19 01:19:35 UTC [REMOTE_INFO] /home/ubuntu/sof-test/test-case/multiple-pipeline.sh will use topology /usr/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg to run the test case
2024-03-19 01:19:35 UTC [REMOTE_INFO] Pipeline list to ignore is specified, will ignore 'pcm=HDA Digital' in test case
2024-03-19 01:19:35 UTC [REMOTE_INFO] Run command to get pipeline parameters
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] sof-tplgreader.py /usr/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg -f 'type:playback ' -b ' pcm:HDA Digital' -s 0 -e
2024-03-19 01:19:35 UTC [REMOTE_INFO] Testing: HDA Analog [hw:0,0]
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] aplay   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-19 01:19:35 UTC [REMOTE_INFO] Testing: HDMI1 [hw:0,3]
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] aplay   -D hw:0,3 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-19 01:19:35 UTC [REMOTE_INFO] Testing: HDMI2 [hw:0,4]
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] aplay   -D hw:0,4 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-19 01:19:35 UTC [REMOTE_INFO] Testing: HDMI3 [hw:0,5]
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] aplay   -D hw:0,5 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-19 01:19:35 UTC [REMOTE_INFO] wait 0.5s for aplay_opts()
aplay: main:831: audio open error: Device or resource busy

@ujfalusi @kv2019i FYI

cc:

  • #4192 - August 2023 duplicate

plbossart avatar Mar 19 '24 20:03 plbossart

I can't reproduce this issue on my own UpExtreme board which was collecting dust for a while.

I have zero issues when pulseaudio is disabled (as in mv /usr/bin/pulseaudio /usr/bin/_pulseaudio)

TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg sof-test/test-case/run-all-tests.sh -l1

...


PASS: tplg-binary; pcm_list; playback-d100l1r1; capture-d100l1r1; playback-d1l100r1; capture_d1l100r1; playback_d1l1r50; capture_d1l1r50; speaker; pause-resume-playback; pause-resume-capture; volume; signal-stop-start-playback; signal-stop-start-capture; simultaneous-playback-capture; multiple-pipeline-playback; multiple-pipeline-capture; multiple-pause-resume; kmod-load-unload; kmod-load-unload-after-playback; suspend-resume; suspend-resume-with-playback; suspend-resume-with-capture;
SKIP: sof-logger; ipc-flood; xrun-injection-playback; xrun-injection-capture; volume_levels;

 test end with 0 failed tests

With pulseaudio nothing works, so my money is on a configuration problem for this board.

@fredoh9 @marc-hb any ideas?

plbossart avatar Mar 19 '24 21:03 plbossart

I can reproduce on this particular device. There is no pulseaudio running (I think we check that somewhere?)

I can see this error at boot and then again every time the test fails:


TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg ~/sof-test/test-case/multiple-pipeline.sh -f p  -s

Mar 19 22:52:16 jf-whl-upext-hda-2 kernel: snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi3: -16
Mar 19 22:52:16 jf-whl-upext-hda-2 kernel:  iDisp3: ASoC: error at __soc_pcm_open on iDisp3: -16
Mar 19 22:52:16 jf-whl-upext-hda-2 kernel:  HDMI3: ASoC: error at dpcm_be_dai_startup on HDMI3: -16
Mar 19 22:52:16 jf-whl-upext-hda-2 kernel:  HDMI3: ASoC: error at dpcm_fe_dai_startup on HDMI3: -16

@kv2019i any HDMI idea/suggestion?

@fredoh9 could you (re?)-add verify-kernel-boot-log.sh at the start of the stable-v2.2 test plan?

I also spotted this:

Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: Failed to start Application launched by gnome-session-binary.
Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: app-gnome-xdg\x2duser\x2ddirs-825.scope: Failed to add PIDs to scope's control group: No such process
Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: app-gnome-xdg\x2duser\x2ddirs-825.scope: Failed with result 'resources'.
Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: Failed to start Application launched by gnome-session-binary.
Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: [email protected]: Start request repeated too quickly.

FYI:

sudo grep -i connected /sys/kernel/debug/dri/0/i915_display_info
[CONNECTOR:95:DP-1]: status: connected
[CONNECTOR:107:DP-2]: status: disconnected
[CONNECTOR:114:DP-3]: status: disconnected
[CONNECTOR:120:HDMI-A-1]: status: disconnected
--------- Printing module parameters ----------
/sys/module/snd_intel_dspcfg/parameters/dsp_driver:3
/sys/module/snd_sof/parameters/ipc4_ignore_cpc:N
/sys/module/snd_sof/parameters/sof_debug:1
/sys/module/snd_sof_intel_hda/parameters/codec_mask:-1
/sys/module/snd_sof_intel_hda_common/parameters/always_enable_dmi_l1:N
/sys/module/snd_sof_intel_hda_common/parameters/disable_rewinds:N
/sys/module/snd_sof_intel_hda_common/parameters/enable_trace_D0I3_S0:N
/sys/module/snd_sof_intel_hda_common/parameters/position_quirk:1
/sys/module/snd_sof_intel_hda_common/parameters/sof_use_tplg_nhlt:N
/sys/module/snd_sof_intel_hda_generic/parameters/dmic_num:-1
/sys/module/snd_sof_intel_hda_generic/parameters/hda_model:(null)
/sys/module/snd_sof_intel_hda_generic/parameters/mclk_id:-1
/sys/module/snd_sof_intel_hda_generic/parameters/sdw_clock_stop_quirks:8
/sys/module/snd_sof_intel_hda_generic/parameters/use_msi:Y
/sys/module/snd_sof_pci/parameters/fw_filename:(null)
/sys/module/snd_sof_pci/parameters/fw_path:(null)
/sys/module/snd_sof_pci/parameters/ipc_type:-1
/sys/module/snd_sof_pci/parameters/lib_path:(null)
/sys/module/snd_sof_pci/parameters/sof_pci_debug:0
/sys/module/snd_sof_pci/parameters/tplg_filename:(null)
/sys/module/snd_sof_pci/parameters/tplg_path:(null)
/sys/module/snd_sof_probes/parameters/enable:N
/sys/module/snd_hda_codec/parameters/dump_coef:-1
/sys/module/snd_hda_codec_hdmi/parameters/enable_acomp:Y
/sys/module/snd_hda_codec_hdmi/parameters/enable_all_pins:N
/sys/module/snd_hda_codec_hdmi/parameters/enable_silent_stream:Y
/sys/module/snd_hda_codec_hdmi/parameters/static_hdmi_pcm:N
/sys/module/snd_hda_core/parameters/gpu_bind:-1
----------------------------------------
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
	Vendor: American Megatrends Inc.
	Version: UPW1AM18
	Release Date: 06/12/2020
	Address: 0xF0000
	Runtime Size: 64 kB
	ROM Size: 16 MB
	Characteristics:
		PCI is supported
		BIOS is upgradeable
		BIOS shadowing is allowed
		Boot from CD is supported
		Selectable boot is supported
		BIOS ROM is socketed
		EDD is supported
		5.25"/1.2 MB floppy services are supported (int 13h)
		3.5"/720 kB floppy services are supported (int 13h)
		3.5"/2.88 MB floppy services are supported (int 13h)
		Print screen service is supported (int 5h)
		Serial services are supported (int 14h)
		Printer services are supported (int 17h)
		ACPI is supported
		USB legacy is supported
		BIOS boot specification is supported
		Targeted content distribution is supported
		UEFI is supported
	BIOS Revision: 5.13

marc-hb avatar Mar 19 '24 23:03 marc-hb

Thanks @ujfalusi and @plbossart for spotting this:

Last successful runs were on sh-whl-upext-hda-01 (last successful run was 38874, March 14th)

First failing run was 38915 on jf-whl-upext-hda-4 (March 15th)

38998 failed on jf-whl-upext-hda-2 (March 19th)

marc-hb avatar Mar 20 '24 15:03 marc-hb

The failing aplay seems to be always on HDMI3 == hw:0,5

One thing I know @fredoh9 did in JF is to use HDMI "stubs" to fake monitors and avoid Ubuntu issues https://github.com/thesofproject/sof-test/discussions/998

Maybe this wasn't done in SH?

Another thing: this can be reproduced easily but sometimes a fresh reboot is required to reproduce!

marc-hb avatar Mar 20 '24 16:03 marc-hb

Needs more investigation. The codec driver can return -EBUSY if there are no free slots. It's possible something has changed in uptream that would trigger this.

kv2019i avatar Mar 20 '24 16:03 kv2019i

The last upstream merge was two weeks ago, the first error happened a week ago after the last upstream merge.

plbossart avatar Mar 20 '24 16:03 plbossart

The failing aplay seems to be always on HDMI3 == hw:0,5

Correction: the failure always happens on the LAST HDMI port opened concurrently. I reversed the 0 1 2 3 device order in the test to 3 2 1 0 and now it is HDMI1 that fails:

2024-03-20 17:45:09 UTC [INFO] Testing: HDMI3 [hw:0,5]
2024-03-20 17:45:09 UTC [COMMAND] aplay   -D hw:0,5 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-20 17:45:11 UTC [INFO] Testing: HDMI2 [hw:0,4]
2024-03-20 17:45:11 UTC [COMMAND] aplay   -D hw:0,4 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-20 17:45:13 UTC [INFO] Testing: HDMI1 [hw:0,3]
2024-03-20 17:45:13 UTC [COMMAND] aplay   -D hw:0,3 -c 2 -r 48000 -f S16_LE /dev/zero -q
aplay: main:831: audio open error: Device or resource busy
2024-03-20 17:45:15 UTC [INFO] Testing: HDA Analog [hw:0,0]
2024-03-20 17:45:15 UTC [COMMAND] aplay   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
Mar 20 17:43:39 jf-whl-upext-hda-2 kernel: snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi1: -16
Mar 20 17:43:39 jf-whl-upext-hda-2 kernel:  iDisp1: ASoC: error at __soc_pcm_open on iDisp1: -16
Mar 20 17:43:39 jf-whl-upext-hda-2 kernel:  HDMI1: ASoC: error at dpcm_be_dai_startup on HDMI1: -16
Mar 20 17:43:39 jf-whl-upext-hda-2 kernel:  HDMI1: ASoC: error at dpcm_fe_dai_startup on HDMI1: -16

marc-hb avatar Mar 20 '24 17:03 marc-hb

I can't reproduce this issue on my own UpExtreme board which was collecting dust for a while.

@plbossart I think I may have found why you cannot reproduce: the display needs to be turned ON to reproduce. Can you try again with the display ON?

You can see whether the screen is on or not like this:

sudo grep -B2 -e enable -e active -e connected -e audio /sys/kernel/debug/dri/0/i915_display_info

You can control the timeout like this:

gsettings get org.gnome.desktop.session idle-delay  # default 5 minutes for me
gsettings set org.gnome.desktop.session idle-delay 60 # in seconds. There seems to be a 20s minimum?

You can light up the monitor again like this:

sudo systemctl restart gdm3

When the screen goes off, the logs show the DSP going to D3:


Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x20140000 successful
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x90050000: GLB_TRACE_MSG: DMA_FREE
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x40010000: GLB_PM_MSG: CTX_SAVE
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_core_reset_enter: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0x1010f0f successful
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_core_power_down: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0xf0f successful
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_core_is_enabled: sof-audio-pci-intel-cnl 0000:00:1f.3: DSP core(s) enabled? 0 : core_mask f
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:snd_sof_pci_update_bits_unlocked: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIR: 00000000 at  00000044
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:snd_sof_pci_update_bits_unlocked: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIW: 00000010 at  00000044
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda:hda_codec_i915_display_power: sof-audio-pci-intel-cnl 0000:00:1f.3: Turning i915 HDAC power 0
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_state_log: sof-audio-pci-intel-cnl 0000:00:1f.3: Current DSP power state: D3
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-cnl 0000:00:1f.3: fw_state change: 7 -> 0

One thing I know @fredoh9 did in JF is to use HDMI "stubs" to fake monitors and avoid Ubuntu issues https://github.com/thesofproject/sof-test/discussions/998

Maybe this wasn't done in SH?

I just checked and sh-whl-upext-hda-01 has no HDMI stub:

sudo grep -B2 -e enable -e active -e connected /sys/kernel/debug/dri/0/i915_display_info

Connector info
--------------
[CONNECTOR:95:DP-1]: status: disconnected
[CONNECTOR:107:DP-2]: status: disconnected
[CONNECTOR:114:DP-3]: status: disconnected
[CONNECTOR:120:HDMI-A-1]: status: disconnected

cc: @ssavati

marc-hb avatar Mar 20 '24 18:03 marc-hb

Issue reproduces exactly the same with Ubuntu kernels 6.2.0-34-generic and 6.5.0-26-generic, so clearly not any recent regression.

Display OFF: test always passes Display ON: test always fails the same way

Both lsmod | grep snd and lsof | grep dev/snd show nothing busy before the test runs and fails. Usage count for both snd_soc_skl_hda_dsp and snd_sof_pci_intel_cnl is zero before the test runs and fails.

marc-hb avatar Mar 20 '24 20:03 marc-hb

yeah, my tests were in headless mode. I just run the tests without any displays connected. I'll retry later today and try to see what happens.

plbossart avatar Mar 20 '24 22:03 plbossart

I connected a portable display (the one I use when I travel), and I see no issues. HDMI1 [hw:0,3] works fine, I can hear sound being played without any distortion.

root@plb-UP-WHL01:~# sudo grep -B2 -e audio -e enable -e active -e connected /sys/kernel/debug/dri/0/i915_display_info
---------
[CRTC:51:pipe A]:
	uapi: enable=yes, active=yes, mode="": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x0 0x5
	hw: enable=yes, active=yes
--
	underrun reporting: cpu=yes pch=yes
[CRTC:72:pipe B]:
	uapi: enable=no, active=no, mode="": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
	hw: enable=no, active=no
--
	underrun reporting: cpu=yes pch=yes
[CRTC:93:pipe C]:
	uapi: enable=no, active=no, mode="": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
	hw: enable=no, active=no
--
Connector info
--------------
[CONNECTOR:95:DP-1]: status: connected
--
		"640x480": 60 25200 640 656 752 800 480 490 492 525 0x40 0xa
		"640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa
[CONNECTOR:107:DP-2]: status: disconnected
[CONNECTOR:114:DP-3]: status: disconnected
[CONNECTOR:120:HDMI-A-1]: status: disconnected

Can we try to see if the tests pass with a real monitor and not the 'HDMI stubs'? I wonder if those gizmos cause problems.

plbossart avatar Mar 20 '24 22:03 plbossart

I tried this based on @plbossart 's advice but it does not show anything "busy" while the display is ON

sudo systemctl restart gdm3

sudo grep -B2 -e audio -e enable -e active -e connected /sys/kernel/debug/dri/0/i915_display_info

 => display is ON

ubuntu@jf-whl-upext-hda-4:~$ grep ^ $(find /proc/asound/ -name hw_params)
/proc/asound/card0/pcm0c/sub0/hw_params:closed
/proc/asound/card0/pcm0p/sub0/hw_params:closed
/proc/asound/card0/pcm3p/sub0/hw_params:closed
/proc/asound/card0/pcm4p/sub0/hw_params:closed
/proc/asound/card0/pcm5p/sub0/hw_params:closed
/proc/asound/card0/pcm6c/sub0/hw_params:closed
/proc/asound/card0/pcm7c/sub0/hw_params:closed

 => nothing "active" yet test fails as usual

marc-hb avatar Mar 20 '24 23:03 marc-hb

HDMI1 [hw:0,3] works fine,

The failure is NOT port-specific. It fails on the LAST HDMI device opened concurrently.

For instance, if you change the test like this

--- a/test-case/multiple-pipeline.sh
+++ b/test-case/multiple-pipeline.sh
@@ -104,7 +104,7 @@ func_run_pipeline_with_type()
         idx_lst=($(seq 0 $((PIPELINE_COUNT - 1)) | sed 's/ /\n/g' | shuf | xargs))
     fi
 
-    for idx in "${idx_lst[@]}"
+    for idx in 3 1 2 0
     do
         channel=$(func_pipeline_parse_value "$idx" channel)
         rate=$(func_pipeline_parse_value "$idx" rate)

then HDMI2 fails:

2024-03-20 23:13:21 UTC [INFO] Testing: HDMI3 [hw:0,5]
2024-03-20 23:13:21 UTC [COMMAND] aplay   -D hw:0,5 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-20 23:13:21 UTC [INFO] Testing: HDMI1 [hw:0,3]
2024-03-20 23:13:21 UTC [COMMAND] aplay   -D hw:0,3 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-20 23:13:21 UTC [INFO] Testing: HDMI2 [hw:0,4]
2024-03-20 23:13:21 UTC [COMMAND] aplay   -D hw:0,4 -c 2 -r 48000 -f S16_LE /dev/zero -q
aplay: main:831: audio open error: Device or resource busy
2024-03-20 23:13:21 UTC [INFO] Testing: HDA Analog [hw:0,0]
2024-03-20 23:13:21 UTC [COMMAND] aplay   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
Mar 20 23:13:21 jf-whl-upext-hda-4 kernel: snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi2: -16
Mar 20 23:13:21 jf-whl-upext-hda-4 kernel:  iDisp2: ASoC: error at __soc_pcm_open on iDisp2: -16
Mar 20 23:13:21 jf-whl-upext-hda-4 kernel:  HDMI2: ASoC: error at dpcm_be_dai_startup on HDMI2: -16
Mar 20 23:13:21 jf-whl-upext-hda-4 kernel:  HDMI2: ASoC: error at dpcm_fe_dai_startup on HDMI2: -16

I can hear sound being played without any distortion.

I wonder if those gizmos cause problems.

Can you try with a monitor that does NOT support audio? Pretty sure these devices do NOT support audio (how can we tell?)

marc-hb avatar Mar 20 '24 23:03 marc-hb

Can you try with a monitor that does NOT support audio? Pretty sure these devices do NOT support audio (how can we tell?)

I don't have one! I made sure all my displays can be tested with audio....

Can you try this patch https://github.com/thesofproject/linux/pull/4866 to try to get more information on what happens? I think it's a codec issue, not something related to SOF proper.

plbossart avatar Mar 20 '24 23:03 plbossart

Additional logs shows that it's indeed a codec issue

[   65.788307] snd_hda_codec_hdmi ehdaudio0D2: hdac_hda_dai_open: failed -16
[   65.788323] snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi3: -16
static int hdac_hda_dai_open(struct snd_pcm_substream *substream,
			     struct snd_soc_dai *dai)
{
	....

	ret = hda_stream->ops.open(hda_stream, hda_pvt->codec, substream);
	if (ret < 0)
		dev_err(dai->dev, "%s: failed %d\n", __func__, ret);

plbossart avatar Mar 21 '24 16:03 plbossart

Pretty sure these devices do NOT support audio (how can we tell?)

I stand corrected:

sudo systemctl restart gdm3 # turns display ON

sudo grep -B6 audio /sys/kernel/debug/dri/0/i915_display_info

--------------
[CONNECTOR:95:DP-1]: status: connected
	physical dimensions: 510x290mm
	subpixel order: Unknown
	CEA rev: 3
	DPCD rev: 12
	audio support: yes

marc-hb avatar Mar 21 '24 17:03 marc-hb

Additional traces do show it's indeed a 'codec' issue

[   22.071314] snd_hda_codec_hdmi ehdaudio0D2: hdac_hda_dai_open: start
[   22.071318] hdmi_pcm_open: start
[   22.071321] hdmi_pcm_open_no_pin: start
[   22.071322] hdmi_pcm_open_no_pin: hdmi_choose_cvt failed
[   22.071324] hdmi_pcm_open:  hdmi_pcm_open_no_pin returned -16
[   22.071326] hdmi_pcm_open: end
[   22.071327] snd_hda_codec_hdmi ehdaudio0D2: hdac_hda_dai_open: failed -16
[   22.071332] snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi3: -16
[   22.071337]  iDisp3: ASoC: error at __soc_pcm_open on iDisp3: -16
[   22.071341]  HDMI3: ASoC: error at dpcm_be_dai_startup on HDMI3: -16
[   22.071344]  HDMI3: ASoC: error at dpcm_fe_dai_startup on HDMI3: -16
static int hdmi_pcm_open_no_pin(struct hda_pcm_stream *hinfo,
			 struct hda_codec *codec,
			 struct snd_pcm_substream *substream)
{
....
	err = hdmi_choose_cvt(codec, -1, &cvt_idx, false);
	if (err) {
		pr_info("%s: hdmi_choose_cvt failed\n", __func__);
		return err;
	}

No idea what this code does...

plbossart avatar Mar 21 '24 17:03 plbossart

static int hdmi_choose_cvt(struct hda_codec *codec,
			   int pin_idx, int *cvt_id,
			   bool silent)
{
...
	/* No free converters */
	if (cvt_idx == spec->num_cvts) {
		pr_info("%s: no free converters\n", __func__);
		return -EBUSY;
	}

This is where I will stop my investigation for now... @kv2019i @ujfalusi may have a better understanding that me on this issue.

I think it's a corner case coming from the type of 'HDMI stubs' connected to the Up Extreme DUTs.

Still that's concerning that we've had this problem for a while and never even knew about it.... Oh well.

plbossart avatar Mar 21 '24 18:03 plbossart

FWIW, here's the diffs I used

cvt.diff.txt

plbossart avatar Mar 21 '24 18:03 plbossart

Still that's concerning that we've had this problem for a while and never even knew about it.... Oh well.

Not surprising at all considering this requires some (special?) type of monitor to be connected AND powered on!

marc-hb avatar Mar 21 '24 18:03 marc-hb

Simpler test script:

aplay -Dhw:0,3 -c2 -r48000 -fS16_LE /dev/zero &
aplay -Dhw:0,4 -c2 -r48000 -fS16_LE /dev/zero &
aplay -Dhw:0,5 -c2 -r48000 -fS16_LE /dev/zero

So this is not an issue with open/close happening concurrently, more that the third stream isn't supported for some reason.

plbossart avatar Mar 21 '24 18:03 plbossart

Doesn't happen on latest daily test on jf-whl-upext-hda-3, not sure what changed?

plbossart avatar Mar 25 '24 20:03 plbossart

@fredoh9 was there a change in hardware configs? The daily tests using jf-whl-upext-hda-3 don't show any issues.

plbossart avatar Mar 27 '24 21:03 plbossart

I never reproduced on jf-whl-upext-hda-3, did anyone or anything? I reproduced only with jf-...-hda-2 and jf-...-hda-4.

It could have a different BIOS or different model of HDMI plug. I'll have a look when I get back to JF.

marc-hb avatar Mar 28 '24 15:03 marc-hb

Indeed the last test used jf-whl-upext-hda-4 and the issue is back. Those device-to-device variations are hard to track..

plbossart avatar Mar 28 '24 21:03 plbossart

So the reason why jf-whl-upext-hda-3 passed and the others failed is: jf-whl-upext-hda-3 is using a different HDMI stub that does not fake audio support. As soon as I swapped the HDMI stubs, the results followed the stubs.

BUT, I also tested the failing stub on my APL UP2 and TGL UP xtreme and it did NOT cause any problem. So there is a problem specific to this WHL model. Reminder: the failure is NOT on the port connected to the HDMI stub. The failure is always on the last HDMI port used concurrently.

Instead of that HDMI stub, I connected a monitor with audio to a failing board and found no failure and no audio issue.

Here's the EDID for the HDMI stub that triggers the failure:

sudo apt install read-edid
sudo get-edid | parse-edid 


	Identifier "             "
	ModelName "             "
	VendorName "CLB"
	# Monitor Manufactured week 40 of 2013
	# EDID version 1.3

marc-hb avatar Mar 30 '24 01:03 marc-hb

Very interesting and relevant HDMI audio facts learned after discussing with the local expert (thanks Clint!)

  • Don't use parse-edid because its output is incomplete. Notably: parse-edid does not show audio fields! Use edid-decode instead, which comes from a separate Debian/Ubuntu package.

Do this:

sudo apt install edid-decode

sudo get-edid | edid-decode

  Underscans IT Video Formats by default
  Basic audio support
  Supports YCbCr 4:4:4
  ...
  • In HDMI, audio is MANDATORY. If a monitor has one HDMI input, then it MUST have at least an audio jack (EDID is not just for HDMI).

  • Hence the other, HDMI stub that hides this failure is NOT HDMI COMPLIANT! Because its EDID is missing any audio support (it's also reporting really weird display modes but that's hopefully irrelevant)

I've been looking in the labs for a 3rd type of HDMI stub but I didn't find any yet. Ideally a compliant one with audio support.

marc-hb avatar Apr 03 '24 20:04 marc-hb

I've been looking in the labs for a 3rd type of HDMI stub but I didn't find any yet. Ideally a compliant one with audio support.

No way to find a 3rd type.

We can silence this failure by using on WHL only the (non compliant...) HDMI stub that does not support audio. Deal? It would only be hiding the failure, not solving it.

marc-hb avatar Apr 03 '24 22:04 marc-hb

We can silence this failure by using on WHL only the (non compliant...) HDMI stub that does not support audio. Deal? It would only be hiding the failure, not solving it.

Done on jf-whl-*-hda2,3,4

marc-hb avatar Apr 04 '24 23:04 marc-hb

I am lost here.

We have an issue with a HDMI compliant device which exposes audio, but "how compliant" is it if we get this error all the time?

plbossart avatar Apr 05 '24 15:04 plbossart