linux icon indicating copy to clipboard operation
linux copied to clipboard

LNL HDA pause-release issue

Open plbossart opened this issue 1 year ago • 6 comments

Started seeing this sort of issues today:

https://sof-ci.01.org/linuxpr/PR5044/build3423/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-pause-resume-capture-100

(100/100) Wait for 172 ms before resume

declare -- cmd="journalctl_cmd --since=@1717691003"
2024-06-06 16:23:56 UTC [REMOTE_INFO] Entering expect script with:
      arecord   -D hw:0,6 -r 48000 -c 4 -f S32_LE -vv -i /dev/null -q
spawn arecord -D hw:0,6 -r 48000 -c 4 -f S32_LE -vv -i /dev/null -q
Hardware PCM card 0 'sof-hda-dsp' device 6 subdevice 0
Its setup is:
  stream       : CAPTURE
  access       : RW_INTERLEAVED
  format       : S32_LE
  subformat    : STD
  channels     : 4
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 32
  buffer_size  : 24000
  period_size  : 6000
  period_time  : 125000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 6000
  period_event : 0
  start_threshold  : 1
  stop_threshold   : 24000
  silence_threshold: 0
  silence_size : 0
  boundary     : 6755399441055744000
  appl_ptr     : 0
  hw_ptr       : 0

##################################################+| MAX
##################################################+| MAX
##################################################+| MAX
...
##################################################+| MAX
##################################################+| MAX
##################################################+| MAX
##################################################+| MAX
##################################################+| MAX
##################################################+| MAX
##################################################+| MAX
##################################################+| MAX
2024-06-06 16:24:06 UTC [REMOTE_INFO] Starting func_exit_handler(1)
2024-06-06 16:24:06 UTC [REMOTE_ERROR] Starting func_exit_handler(), exit status=1, FUNCNAME stack:
2024-06-06 16:24:06 UTC [REMOTE_ERROR]  main()  @  /home/ubuntu/sof-test/test-case/check-pause-resume.sh
2024-06-06 16:24:06 UTC [REMOTE_INFO] pkill -TERM -f mtrace-reader.py
2024-06-06 16:24:06 UTC [REMOTE_INFO] nlines=1926 /home/ubuntu/sof-test/logs/check-pause-resume/2024-06-06-16:23:23-6831/mtrace.txt
+ grep -B 2 -A 1 -i --word-regexp -e ERR -e ERROR -e '' -e OSError /home/ubuntu/sof-test/logs/check-pause-resume/2024-06-06-16:23:23-6831/mtrace.txt
2024-06-06 16:24:06 UTC [REMOTE_INFO] ktime=583 sof-test PID=8280: ending
2024-06-06 16:24:06 UTC [REMOTE_INFO] Test Result: FAIL!

Nothing blatantly wrong in the dmesg log or mtrace.

@ujfalusi @fredoh9 @ssavati @marc-hb Is this a regression?

plbossart avatar Jun 07 '24 12:06 plbossart

I cannot tell what is the reason for the fail to be honest.

ujfalusi avatar Jun 07 '24 13:06 ujfalusi

Also spotted earlier in https://github.com/intel-innersource/drivers.audio.ci.sof-framework/issues/566#issuecomment-2146091310

I don't know what's going on.

I know that this test should first be fixed. I approved this fix from @fredoh9 a long time ago but @plbossart you still had reservations:

  • https://github.com/thesofproject/sof-test/pull/931

(I forgot everything about 931)

cc:

  • https://github.com/thesofproject/linux/issues/3766
  • internal issue # 302

marc-hb avatar Jun 07 '24 16:06 marc-hb

Is this a duplicate?

  • https://github.com/thesofproject/sof/issues/9191

marc-hb avatar Jun 07 '24 16:06 marc-hb

seen again in https://sof-ci.01.org/linuxpr/PR5064/build3677/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-pause-resume-capture-100

@kv2019i another problem to track for 2.10...

plbossart avatar Jun 14 '24 12:06 plbossart

@plbossart Ack. Liam did move #9191 to 2.11, issues with pause-resume not blocking 2.10 release.

kv2019i avatar Jun 14 '24 12:06 kv2019i

More recent reproduction today: https://sof-ci.01.org/sofpr/PR9235/build5580/devicetest/index.html?model=LNLM_RVP_HDA&testcase=multiple-pause-resume-50

Again in June 17th daily 42633?model=LNLM_RVP_HDA&testcase=multiple-pause-resume-50

marc-hb avatar Jun 15 '24 00:06 marc-hb

So this test used to time out because "MAX" didn't match anything expected by the Expect script. I rewrote that script and named it case-lib/apause.exp in https://github.com/thesofproject/sof-test/pull/1218 which was merged today. For now, the rewrite will neither time out nor fail on "MAX" because I wanted the first script version to be "generous" and to assume it could be just a problem with ALSA settings. I needed this so the rewrite could be tested and merged without polluting results too much. But obviously, "MAX" is not always just a problem with ALSA settings. For instance, the initial "MAX pop" TGL issue #3766 does not look like a problem with ALSA settings (please prove me wrong).

That's why "MAX" can be turned into an error with just a one-line change in the new case-lib/apause.exp script. I intend to submit that change after a couple daily test runs.

marc-hb avatar Jul 15 '24 22:07 marc-hb

That's why "MAX" can be turned into an error with just a one-line change in the new case-lib/apause.exp script. I intend to submit that change after a couple daily test runs.

We won't catch "MAX" volume after all because of wontfix TGL bug

  • https://github.com/thesofproject/linux/issues/3766

MAX will stay just a WARNING.

marc-hb avatar Jul 20 '24 00:07 marc-hb

Closing, please reply and re-open if you disagree.

marc-hb avatar Jul 29 '24 19:07 marc-hb