LNL HDA alsabat capture failed, "Peak freq too low"
Originally posted by @marc-hb in https://github.com/thesofproject/sof/issues/9123#issuecomment-2125449102
https://sof-ci.01.org/sofpr/PR9151/build4789/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-capture-997 ( ba-lnlm-rvp-hda-02)
2024-05-22 10:38:13 UTC [REMOTE_COMMAND] alsabat -Phw:CODEC,0 --standalone -n 240000 -r 48000 -c 2 -f S16_LE -F 997 -k 2.1
2024-05-22 10:38:14 UTC [REMOTE_COMMAND] alsabat -Chw:sofhdadsp,0 -c 2 -r 48000 -f S16_LE -F 997 -k 2.1
FAIL: Peak freq too low 973.39 Hz
FAIL: Peak freq too low 974.85 Hz
FAIL: Peak freq too low 977.05 Hz
FAIL: Peak freq too low 978.52 Hz
FAIL: Peak freq too low 979.98 Hz
FAIL: Peak freq too low 982.18 Hz
FAIL: Peak freq too low 983.64 Hz
FAIL: Peak freq too low 987.30 Hz
FAIL: Peak freq too low 988.77 Hz
alsa-utils version 1.2.6
Entering capture thread (ALSA).
Get period size: 3000 buffer size: 24000
Recording ...
Capture completed.
BAT analysis: signal has 65536 frames at 48000 Hz, 2 channels, 2 bytes per sample.
Channel 1 - Checking for target frequency 997.00 Hz
Amplitude: 9878.5; Percentage: [30]
Detected peak at 973.39 Hz of 17.57 dB
Total 17.6 dB from 973.39 to 973.39 Hz
Detected peak at 974.85 Hz of 17.64 dB
Total 20.6 dB from 974.85 to 974.85 Hz
Detected peak at 977.05 Hz of 17.74 dB
Total 22.4 dB from 977.05 to 977.05 Hz
Detected peak at 978.52 Hz of 18.62 dB
Total 23.9 dB from 978.52 to 978.52 Hz
Detected peak at 979.98 Hz of 18.92 dB
Total 25.1 dB from 979.98 to 979.98 Hz
Detected peak at 982.18 Hz of 18.84 dB
Total 26.7 dB from 981.45 to 982.18 Hz
Detected peak at 983.64 Hz of 19.88 dB
Total 27.5 dB from 983.64 to 983.64 Hz
Detected peak at 987.30 Hz of 20.82 dB
Total 29.8 dB from 985.11 to 987.30 Hz
Detected peak at 988.77 Hz of 22.17 dB
Total 30.5 dB from 988.77 to 988.77 Hz
Detected peak at 996.83 Hz of 34.94 dB
Total 41.0 dB from 990.23 to 1004.15 Hz
PASS: Peak detected at target frequency
Detected at least 10 signal(s) in total
Return value is -1003
Again https://sof-ci.01.org/softestpr/PR1200/build467/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-capture-821
@fredoh9 how many jack connectors do we have on the RVP? is the same 3.5mm jack connector for HDaudio and SoundWire reworks?
@fredoh9 how many jack connectors do we have on the RVP?
Only one jack. I don't remember any RVP with more than one jack.
is the same 3.5mm jack connector for HDaudio and SoundWire reworks?
I don't know. I looked at jf-lnlm-rvp-hda-1 and the USB loopback goes through the AIOC board, the RVP jack is not connected. On the other hand, this device is not enabled in CI right now :-( afraid we have to wait for @fredoh9 for this one.
I just noticed someone disabled ba-lnlm-rvp-hda-02, on which the two failures were found...
@plbossart We have only one jack in the RVP, i don't remember any RVP with multiple JACK.
@marc-hb For LNLM_RVP_HDA, the RVP doesn't have HDA codec, so we attached external AIOC. Hence the jack for the HDA codec is in external AIOC is being used.
I just noticed someone disabled ba-lnlm-rvp-hda-02
From @ssavati : silicon being upgraded.
@marc-hb now ba-lnlm-rvp-hda-02 board is up with silicon upgrade. Its uses community firmware
Very recent one today: https://sof-ci.01.org/softestpr/PR1180/build512/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-playback-821
Underrun: Broken pipe(-32)
FAIL: Peak freq too low 799.07 Hz
FAIL: Peak freq too low 802.73 Hz
FAIL: Peak freq too low 806.40 Hz
FAIL: Peak freq too low 807.86 Hz
FAIL: Peak freq too high 834.96 Hz
FAIL: Peak freq too high 838.62 Hz
FAIL: Peak freq too high 842.29 Hz
Got one reproduction of this, had to run test 200+ times to hit the problem, so occurence rate would seem to be below 1%. In the one failing case, captured bat.wav looks good but analysis fails, so not clear yet what is happening, but the error looks similar as in original report. If you can hit @jsarha please add data. @marc-hb is this roughly in line with occurence rate you've seen?
I would guess the reproduction rate is somewhere between 0.1% and 5% so, yes: 1% falls in that range :-)
Did 249 runs more on ba-lnlm-rvp-hda-02, no failures. I'll pick a machine tomorrow again, run couple of hours more. It bothers me a bit that there is an obvious glitch in the audio in the both referred occurrences, but they are not at all similar. Its very hard to imagine a common cause for both of them.
Here is one more occurrence, very much like the one in https://github.com/thesofproject/sof/issues/9164#issuecomment-2164285904 . Gap af around 0.2s seconds (one bit longer another shorter) in the middle of the capture, then a little glitch ~ 0.015s before the signal starts to come back in a ramp.
That was bit under 300 test runs this morning, and one occurrence.
occurrence_2024-06-14-10.zip occurrence_2024-06-14-11-25.zip occurrence_2024-06-14-11-46.zip
Here is three more occurrences, but I do not think its the same issue. They look more like test setup failures to me. The bat.wav files look perfectly Ok, but for some reason the validation fails.
The bat.wav files look perfectly Ok, but for some reason the validation fails.
I'm not familiar with alsabat but I've been told that it is much more sensitive than the human eye or even ear.
The bat.wav files look perfectly Ok, but for some reason the validation fails.
I'm not familiar with alsabat but I've been told that it is much more sensitive than the human eye or even ear.
I did not try to analyze it with my ears, but did frequency analysis with Audacity, and the frequency peak was there exactly in the right place, with no other local peaks.
There's a small glitch in occurrence_2024-06-14-11-46.zip around 0.6740, which explains the alsabat fail.
occurrence-2024-06-17-11-04.zip occurrence-gap-2024-06-17-12-02.zip
Two more cases. One without any immediately obvious fault, but probably some subtle discontinuation in the sine-wave somewhere. The other has the obvious gap-pattern. The gap is a bit wider this time, about 700ms.
There's a small glitch in occurrence_2024-06-14-11-46.zip around 0.6740, which explains the alsabat fail.
Can you please summarize how you found (with Audacity?) what @jsarha didn't?
Can you please summarize how you found (with Audacity?) what @jsarha didn't?
I think the same tools can be used. I missed this as well at first as expectation was a big visible gap or a repeating glitch pattern. I noticed this when listening to the file and noted the glitch. Then freq analysis in Audacity in small segments to limit the search space further and final bits by manual analysis of sample values (zooming into waveform display and/or exporting sample data values to text file) to find the exact point.
Not sure how much light this sheds the issue, but I first run 673 successful round of alsabat test [1] using sof-hda-benchmark-gain32.tplg, and then quit the test script without a single failure. Then I restored the original sof-hda-generic-ace1-4ch.tplg and was able to run 263 rounds when I hit the error. BUT, the error I hit is of a completely new class. The signal is simply cut off in the middle of sample, and it does not resume. This was all with 3da8e6474531411bef64819113a29a7edbd51fcf FW commit. The test logs and the failed test-case is in the attached zip.
testlogs-and-failed-testcase.zip
[1] TPLG=/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg MODEL=LNLM_RVP_HDA SOF_TEST_INTERVAL=5 ~/sof-test/test-case/check-alsabat.sh -p hw:sofhdadsp,0 -c hw:CODEC,0 -C 2 -F 821
https://sof-ci.01.org/linuxpr/PR5075/build3763/devicetest/index.html
Also daily test run 42929?model=LNLM_RVP_NOCODEC&testcase=check-alsabat-nocodec-32bits-599
June 20th: https://sof-ci.01.org/softestpr/PR1180/build552/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-capture-997
June 20th: https://sof-ci.01.org/softestpr/PR1180/build552/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-capture-997
Oh, this is yet a new type of failure. There is a gap of only ~6ms and - bit suspiciously - the signal continues from exactly the same phase after the gap. Buffer under-run somewhere?
There is couple of FW log messages like these in the middle of the test log (so not at the setup or tear-down time):
[ 771.127966] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x10004 no bytes to copy, available samples: 0, free_samples: 384
[ 771.127991] <wrn> dai_comp: dai_common_copy: comp:0 0x4 dai_zephyr_copy(): nothing to copy
[ 771.128913] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x10004 no bytes to copy, available samples: 0, free_samples: 384
I run today another 680 round of alsabat test with sof-hda-benchmark-gain32.tplg to be sure that I just did not get lucky last friday. E.g. run again this test:
https://github.com/thesofproject/sof/issues/9164#issuecomment-2181086368
To complete this test, I ran another 230 rounds with the same daily build (20240623/sof-28a5265568a8-1) this time with standard generic-ace1-4ch topology, to hit the "phase shift"-error again [1]. Starts to look like the issue does not show with gain widget only. I'll try some other benchmark topologies next, to see if I can find the problematic widget that way.
I hacked alsa-utils alsabat to allow 100M frames in the test instead of just 10M and increased MAX_PEAK also 10 fold to 100. Then I needed also fix sof-test check-alsabat.sh to pass -n
I hacked alsa-utils alsabat to allow 100M frames in the test instead of just 10M and increased MAX_PEAK also 10 fold to 100. Then I needed also fix sof-test check-alsabat.sh to pass -n parameter also to capture part of the test. This allowed me to run bit over 36min alsabat test, which was successful. I do not have full understanding of alsabat code or how my hacks may affect it, so the result should be taken with a grain of salt, but in any case the test log is here:
Thanks @jsarha - results look good to me. We should upstream the alsa-bat update.
One case today in PR testing, see Intel test run 43099. This has a 3000 sample (one host period?) gap like https://github.com/thesofproject/sof/issues/9164#issuecomment-2167684205 but in this case, audio is corrupt after the gap (dominant 599Hz tone present but multiple other tones). The left-right signals show different values.
It took 724 rounds but then I got a glitch with sof-hda-benchmark-drc32.tplg. Not sure if this is directly pointing at drc widget, but anyway here is the eveidence:
Reproduction rate low in past week, continue with P2 and assign to v2.11.
I became suspicious about the last weeks findings and decided to try them again with enough cycles to know with reasonable certainty in what configurations the issue happens and in what it does not. So I run the test again using following configurationns:
- benchmark-gain32 topology, issue occurred after 583 cycles [1]
- DSPless mode, issue occurred after 852 cycles [2]
- nocodec, playback and capture from hw:0,0, no issue found after 1214 cycles [3]
This starts to point now either to the HDA interface or to the USB audio device that is used for capture in DUT. It would be nice to be able to test this with a loop-back cable from RVP line-out to line-in.
[1] alsabat-benchmark-gain32.zip [2] alsabat-DSPless.zip [3] alsabat-nocodec-log.txt