sof
sof copied to clipboard
[BUG] Failed capture tests on DELL SKU 0A32 with SOF 2.2
Describe the bug
Running sof-test with SOF 2.2 production-signed firmware results in failure for all capture tests
FAIL: capture-d100l1r1; capture_d1l100r1; capture_d1l1r50;
This is on a device with Fedora 36 installed (pipewire disabled).
To Reproduce TPLG=/lib/firmware/intel/sof-tplg/sof-tgl-rt711-rt1316-rt714.tplg MODEL=TGLU_SKU0A32_SDCA ~/sof-test/test-case/check-capture.sh -d 10 -l 1 -r 1
Reproduction Rate 100%
Expected behavior This device seems to be perfectly supported by SOF in Intel daily test, something is off in either firmware or environment.
Impact What impact does this issue have on your progress (e.g., annoyance, showstopper)
Environment
- Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
- Kernel: {55946d870a22796486c0bb97463945ba3c82c99a (HEAD -> merge/sound-upstream-20220801
- SOF: 2.2 production signed
- sof-test 6544bc19b5537e61febc6fd164f08de908ccbc49
- Name of the topology file
- Topology: sof-tgl-rt711-rt1316-rt714.tplg
- Name of the platform(s) on which the bug is observed.
- Platform: DELL SKU 0A32
Screenshots or console output
[root@fedora ~]# TPLG=/lib/firmware/intel/sof-tplg/sof-tgl-rt711-rt1316-rt714.tplg MODEL=TGLU_SKU0A32_SDCA ~/sof-test/test-case/check-capture.sh -d 10 -l 1 -r 1
2022-08-01 20:40:30 UTC [INFO] ktime=680 sof-test PID=1712: starting
2022-08-01 20:40:30 UTC [INFO] DUT is running IPC3 mode
2022-08-01 20:40:30 UTC [INFO] Starting /usr/local/bin/sof-logger -t -l /lib/firmware/intel/sof/sof-tgl.ldc
2022-08-01 20:40:30 UTC [INFO] /root/sof-test/test-case/check-capture.sh will use topology /usr/lib/firmware/intel/sof-tplg-v2.2/sof-tgl-rt711-rt1316-rt714.tplg to run the test case
2022-08-01 20:40:30 UTC [INFO] Pipeline list to ignore is specified, will ignore 'pcm=HDA Digital' in test case
2022-08-01 20:40:30 UTC [INFO] Run command to get pipeline parameters
2022-08-01 20:40:30 UTC [COMMAND] sof-tplgreader.py /usr/lib/firmware/intel/sof-tplg-v2.2/sof-tgl-rt711-rt1316-rt714.tplg -f 'type:capture & id:any' -b ' pcm:HDA Digital' -s 0 -e
2022-08-01 20:40:30 UTC [INFO] ===== Testing: (Round: 1/1) (PCM: Jack In [hw:0,1]<capture>) (Loop: 1/1) =====
2022-08-01 20:40:30 UTC [INFO] no file prefix, use /dev/null as dummy capture output
2022-08-01 20:40:30 UTC [COMMAND] arecord -Dhw:0,1 -r 48000 -c 2 -f S16_LE -d 10 /dev/null -v -q
Hardware PCM card 0 'sof-soundwire' device 1 subdevice 0
Its setup is:
stream : CAPTURE
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 16384
period_size : 4096
period_time : 85333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 4096
period_event : 0
start_threshold : 1
stop_threshold : 16384
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
2022-08-01 20:40:41 UTC [ERROR] Starting func_exit_handler(), exit status=1, FUNCNAME stack:
2022-08-01 20:40:41 UTC [ERROR] arecord_opts() @ /root/sof-test/test-case/../case-lib/lib.sh
2022-08-01 20:40:41 UTC [ERROR] main() @ /root/sof-test/test-case/check-capture.sh:97
2022-08-01 20:40:41 UTC [INFO] Starting /usr/local/bin/sof-logger -l /lib/firmware/intel/sof/sof-tgl.ldc
2022-08-01 20:40:43 UTC [INFO] pkill -TERM sof-logger
Terminated
2022-08-01 20:40:44 UTC [INFO] nlines=3 /root/sof-test/logs/check-capture/2022-08-01-16:40:30-31205/etrace.txt
2022-08-01 20:40:44 UTC [INFO] ktime=694 sof-test PID=1712: ending
+ timeout -s CONT 7 sudo sync
real 0m0.032s
user 0m0.012s
sys 0m0.009s
+ timeout -s CONT 5 dd if=/dev/zero of=/root/HD_TEST_DELETE_ME bs=1M count=100 conv=fsync
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0646952 s, 1.6 GB/s
+ timeout -s CONT 7 sudo sync
real 0m0.038s
user 0m0.003s
sys 0m0.026s
2022-08-01 20:40:45 UTC [INFO] Test Result: FAIL!
it's not exactly clear what happens from the log.
The same test seems to pass in Intel internal tests https://sof-ci.sh.intel.com/#/result/planresultdetail/14339?model=TGLU_SKU0A32_SDCA&testcase=check-capture-10sec
@keqiaozhang @marc-hb any ideas what could have gone sideways or be dependent on Ubuntu v. Fedora setups?
2022-08-01 20:40:41 UTC [ERROR] arecord_opts() @ /root/sof-test/test-case/../case-lib/lib.sh
2022-08-01 20:40:41 UTC [ERROR] main() @ /root/sof-test/test-case/check-capture.sh:97
It looks like arecord
is simply crashing without any error message.
There's no sof-test error message either because a long time I made the mistake of adding set -e
before fixing shellcheck warnings:
In test-case/check-capture.sh line 98:
if [[ $? -ne 0 ]]; then
^-- SC2181: Check exit code directly with e.g. 'if mycmd;', not indirectly with $?.
can you please try this fix?
--- a/test-case/check-capture.sh
+++ b/test-case/check-capture.sh
@@ -94,10 +94,10 @@ do
dlogi "using $file as capture output"
fi
- arecord_opts -D$dev -r $rate -c $channel -f $fmt_elem -d $duration $file -v -q
- if [[ $? -ne 0 ]]; then
+ arecord_opts -D$dev -r $rate -c $channel -f $fmt_elem -d $duration $file -v -q || {
func_lib_lsof_error_dump $snd
die "arecord on PCM $dev failed at $i/$loop_cnt."
- fi
+ }
done
done
@marc-hb thanks, I removed the spurious leftover 'fi' and looks indeed like an arecord status issue
arecord -Dhw:0,1 -r 48000 -c 2 -f S16_LE -d 10 /dev/null -v -q || echo "this arecord failed"
Hardware PCM card 0 'sof-soundwire' device 1 subdevice 0
Its setup is:
stream : CAPTURE
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 16384
period_size : 4096
period_time : 85333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 4096
period_event : 0
start_threshold : 1
stop_threshold : 16384
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
this arecord failed
@plbossart I assume if we arecord to a file rather than /dev/null the file is empty ? i.e. we dont capture any data ?
@marc-hb thanks, I removed the spurious leftover 'fi'
Thanks, I edited my comment above. As you have tested it, could you please submit this as a quick sof-test fix? Thanks in advance.
@plbossart I assume if we arecord to a file rather than /dev/null the file is empty ? i.e. we dont capture any data ?
indeed checking the file size would be a good way to check that something was captured...
@plbossart @marc-hb So is there any bug aside arecord crashing on Fedora?
@plbossart @marc-hb So is there any bug aside arecord crashing on Fedora?
it's not really crashing @kv2019i, only returning a status that is understood as a failure. That must be an upstream change IMHO, but I don't really have time to bisect it this week.
edit: I meant alsa-utils upstream change.
Ack @plbossart , let's keep this open. Just wondering whether this is the right tracker for the bug, or is this is a sof-test issue. But got it, we don't really know yet.
root caused to https://github.com/alsa-project/alsa-utils/issues/163, this is a regression in alsa-utils release 1.2.7, fixed in alsa-utils upstream.