linux icon indicating copy to clipboard operation
linux copied to clipboard

[LNL] multiple cases failing to HDMI playback on SDW configurations

Open kv2019i opened this issue 1 year ago • 3 comments

Some interaction with recently merged kernel and FW PRs has caused a high rate failure to occur in PR testing: https://sof-ci.01.org/sofpr/PR9116/build6350/devicetest/index.html

2024-07-08 13:40:30 UTC [REMOTE_INFO] ===== Testing: (Round: 1/1) (PCM: HDMI1 [hw:0,5]) (Loop: 1/1) =====
2024-07-08 13:40:30 UTC [REMOTE_COMMAND] aplay   -Dhw:0,5 -r 48000 -c 2 -f S16_LE -d 10 /dev/zero -v -q
aplay: set_params:1416: Unable to install hw params:

The gpu_bind should be disabled in sof-dev kernel, so not sure why HDMI playback is attempted.

Related PRs merged recently:

  • https://github.com/thesofproject/sof/pull/9267 (HDMI tplgs changed)
  • https://github.com/thesofproject/linux/pull/5068 (it's the sof_sdw mach driver that is used in failing cases)

As this is seen in PR testing marking as P1.

kv2019i avatar Jul 08 '24 18:07 kv2019i

FYI @lyakh

kv2019i avatar Jul 08 '24 18:07 kv2019i

Observed in today's daily run https://sof-ci.ostc.intel.com/#/result/planresultdetail/43591?model=LNLM_SDW_AIOC&testcase=check-playback-all-formats on jf-lnlm-rvp-sdw-1

Other LNL configurations are indeed not affected.

[  161.355590] kernel: soundwire_cadence:cdns_init_clock_ctrl: soundwire_intel soundwire_intel.link.0: mclk 19200000 max 4800000 row 50 col 4
[  161.355635] kernel: soundwire_cadence:cdns_init_clock_ctrl: soundwire_intel soundwire_intel.link.3: mclk 19200000 max 4800000 row 50 col 4
[  161.355708] kernel: soundwire_bus:sdw_modify_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: initializing enumeration and init completion for Slave 1
[  161.355718] kernel: soundwire_cadence:cdns_init_clock_ctrl: soundwire_intel soundwire_intel.link.2: mclk 19200000 max 4800000 row 50 col 4
[  161.356138] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ASoC: error at snd_soc_dai_hw_params on iDisp1 Pin: -22
[  161.356276] kernel: snd_sof:sof_pcm_hw_free: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: free stream 5 dir 0
[  161.356561] kernel: snd_sof:sof_pcm_close: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: close stream 5 dir 0
[  161.357248] kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.0: Slave status change: 0x2
[  161.357268] kernel: soundwire_bus:sdw_handle_slave_status: soundwire sdw-master-0-0: Slave attached, programming device number

marc-hb avatar Jul 08 '24 22:07 marc-hb

Could this be caused by some device-specific configuration? It did not happen on ba-lnlm-rvp-sdw-01 in the July 7th (planresultdetail/43565) and July 9th (planresultdetail/43642) daily tests. Also not in https://sof-ci.01.org/softestpr/PR1218/build625/devicetest/index.html

EDIT: failed on ba-lnlm-rvp-sdw-03 in https://sof-ci.01.org/sofpr/PR9276/build6355/devicetest/index.html https://sof-ci.01.org/softestpr/PR1218/build604/devicetest/index.html

jf-lnlm-rvp-sdw-1 in https://sof-ci.01.org/softestpr/PR1218/build600/devicetest/index.html

marc-hb avatar Jul 09 '24 21:07 marc-hb

This issue still reproducible on latest. Currntly we have WA "NO_HDMI_MODE=true" is set on device enviroment so we are not seeing issue in CI results. cc: @kv2019i @plbossart @lgirdwood

ssavati avatar Jul 15 '24 06:07 ssavati

I'll take a look at this, but FYI to @ujfalusi and @ranj063 in case we need to switch.

kv2019i avatar Jul 16 '24 06:07 kv2019i

Only affecting LNL, TGL/MTL HDMI is working fine?

ujfalusi avatar Jul 16 '24 06:07 ujfalusi

@ujfalusi this is not observed on MTL. I will check on TGL and update

ssavati avatar Jul 16 '24 07:07 ssavati

I think @ujfalusi @bardliao @plbossart there's a problem in sof_sdw mach driver handling the case where display driver is not available and no HDMI PCms are available: Jul 08 13:41:39 kernel: snd_soc_sof_sdw:sof_card_dai_links_create: sof_sdw sof_sdw: sdw 5, ssp 0, dmic 0, hdmi 0, bt: 0

But topology has (as it should) the HDMI nodes:

Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI1
Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI2
Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI3

kv2019i avatar Jul 16 '24 07:07 kv2019i

I don't understand how the display driver became unavailable?

plbossart avatar Jul 16 '24 07:07 plbossart

we have this in the configuration: https://sof-ci.ostc.intel.com/#/result/planresultdetail/43591?model=LNLM_SDW_AIOC&testcase=verify-kernel-boot-log

/sys/module/snd_hda_core/parameters/gpu_bind:0

why is this value cleared?

static int gpu_bind = -1;
module_param(gpu_bind, int, 0644);
MODULE_PARM_DESC(gpu_bind, "Whether to bind sound component to GPU "
			   "(1=always, 0=never, -1=on nomodeset(default))");

looks like a stale CI configuration to me, if we want to test HDMI this should not be cleared.

plbossart avatar Jul 16 '24 07:07 plbossart

@plbossart wrote:

I don't understand how the display driver became unavailable?

It wasn't available in sof-dev yet for this platform (not marked as stable yet in kernel --> this can be overridden in the device configuration -> let me go and check this particular device).

UPDATE: edit, we still have commit 003bd609021b9a6205db19d7ef163101856071b5 in sof-dev and we can't remove until we pull in stable version of the xe support or we change the test device configurations to apply a force probe.

kv2019i avatar Jul 16 '24 07:07 kv2019i

I think @ujfalusi @bardliao @plbossart there's a problem in sof_sdw mach driver handling the case where display driver is not available and no HDMI PCms are available: Jul 08 13:41:39 kernel: snd_soc_sof_sdw:sof_card_dai_links_create: sof_sdw sof_sdw: sdw 5, ssp 0, dmic 0, hdmi 0, bt: 0

But topology has (as it should) the HDMI nodes:

Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI1
Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI2
Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI3

@kv2019i, we should have dummy links for the HDMI PCMs to probe. They will not work, but they need to be there to be able to load the topology.

ujfalusi avatar Jul 16 '24 09:07 ujfalusi

We do have dummy links, the problem is not the probe:

	for (i = 0; i < hdmi_num; i++) {
		char *name = devm_kasprintf(dev, GFP_KERNEL, "iDisp%d", i + 1);
		char *cpu_dai_name = devm_kasprintf(dev, GFP_KERNEL, "iDisp%d Pin", i + 1);
		char *codec_name, *codec_dai_name;

		if (intel_ctx->hdmi.idisp_codec) {
			codec_name = "ehdaudio0D2";
			codec_dai_name = devm_kasprintf(dev, GFP_KERNEL,
							"intel-hdmi-hifi%d", i + 1);
		} else {
			codec_name = "snd-soc-dummy";
			codec_dai_name = "snd-soc-dummy-dai";
		}

		ret = asoc_sdw_init_simple_dai_link(dev, *dai_links, be_id, name,
						    1, 0, // HDMI only supports playback
						    cpu_dai_name, platform_component->name,
						    ARRAY_SIZE(platform_component),
						    codec_name, codec_dai_name,
						    i == 0 ? sof_sdw_hdmi_init : NULL, NULL);
		if (ret)
			return ret;

		(*dai_links)++;
	}

It's the error on hw_params that needs to be root-caused.

plbossart avatar Jul 16 '24 09:07 plbossart

the debug log is misleading

	dev_dbg(dev, "sdw %d, ssp %d, dmic %d, hdmi %d, bt: %d\n",
		sdw_be_num, ssp_num, dmic_num,
		intel_ctx->hdmi.idisp_codec ? hdmi_num : 0, bt_num);

hdmi_num is 4 on TGL and 3 on all other devices, so we do create 3+ links.

plbossart avatar Jul 16 '24 10:07 plbossart

The HDMI PCM never worked when there were no HDMI hardware, it has been like this with HDA devices also. The hw_params fails because of the missing real DAI.

ujfalusi avatar Jul 16 '24 10:07 ujfalusi

That's not true @ujfalusi , this has been working but has been broken at some point. It seems some of the changes to HDA DAI ops now return -EINVAL when dummy codec driver is connected. This DID work in the past.

UPDATE: I can confirm this is broken on TGL as well if HDMI is disable via codec_mask. This did work in the past, will bisect to see where this got broken.

kv2019i avatar Jul 16 '24 10:07 kv2019i

@kv2019i, I'm not sure about past, but now it is not working on tgl either:

[   32.290196] snd_soc_core:dpcm_be_dai_hw_params:  iDisp1: ASoC: hw_params BE iDisp1
[   32.290205] sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at snd_soc_dai_hw_params on iDisp1 Pin: -22
[   32.290212] snd_soc_core:dpcm_be_dai_hw_params:  HDMI1: ASoC: dpcm_be_dai_hw_params() failed at iDisp1 (-22)
[   32.290219] snd_soc_core:dpcm_fe_dai_hw_free:  HDMI1: ASoC: hw_free FE HDMI1

ujfalusi avatar Jul 16 '24 10:07 ujfalusi

  • https://github.com/thesofproject/linux/pull/4639
  • https://github.com/thesofproject/linux/pull/4649

and:

  • https://github.com/thesofproject/linux/pull/4639#issuecomment-1776737040
  • https://github.com/thesofproject/linux/pull/4639#issuecomment-1776984472

ujfalusi avatar Jul 16 '24 10:07 ujfalusi

I'm sure about the past :) -- but this is not just for debug, this is actual product config for HDA where there is Intel GPU is disabled for reason or another. Granted most of these laptops use the non-SOF driver, but there are actual product configs with dmic (=SOF) and some other GPU, so this dummy codec construct must work!

kv2019i avatar Jul 16 '24 10:07 kv2019i

@kv2019i, I trust your memory. It did not worked on 18.09.2023: https://github.com/thesofproject/linux/issues/4594#issuecomment-1722865534

Can this be the reason: https://github.com/thesofproject/linux/pull/4659 ? We don't register HDMI dais when there is no HDMI, before that PR we registered the dais multiple times (analog would register the HDMI also and HDMI would register the analog), causing warnings.

ujfalusi avatar Jul 16 '24 10:07 ujfalusi

No, it's not #4659 -- this is probably older.

I'll lower the priority now as this is not hit at card probe and normal applications will not open the HDMI if no monitor is detected (and no monitor ever will on these devices). So the remaining open is Pulseaudio/Pipewire habit of opening the PCMs and doing a hw_params query. Maybe -EINVAL is ok for this case as well (and my memory really malfunctions here). If so, we can close this.

kv2019i avatar Jul 16 '24 12:07 kv2019i

Tested with upstream 6.8 kernel and pipewire 0.3.79 (versions used in 24.04LTS) and the -EINVAL errors at pipewire start are handled correctly and rest of audio functionalty is ok. So I'll close this as works-as-expected ad we can track the test device configuration issues elsewhere.

kv2019i avatar Jul 16 '24 13:07 kv2019i