Genlocked alsa loopback problem with Wolfram wm8731 on 6.6.40
Describe the bug
After upgrading to kernel 6.6.40 from 6.6.31 (same problem with 6.6.42), I can not use the alsa loopback devices genlocked to the Wolfram 8731 sound device.
Steps to reproduce the behaviour
Just update kernel and run an application which opens the alsa loopback devices.
Device (s)
Raspberry Pi 5, Raspberry Pi 4
System
cat /etc/rpi-issue
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 2acf7afcba7d11500313a7b93bb55a2aae20b2d6, stage2
vcgencmd version
2024/07/30 15:25:46 Copyright (c) 2012 Broadcom version 790da7ef (release) (embedded)
uname -a
Linux gateway.hermes.radio 6.6.42-v8+ #1787 SMP PREEMPT Thu Jul 25 17:44:41 BST 2024 aarch64 GNU/Linux
Logs
Initial dmesh output. Nothing suspicious to me.
(...) [ 4.826923] wm8731 1-001a: Assuming static MCLK [ 4.826953] wm8731 1-001a: supply AVDD not found, using dummy regulator [ 4.827050] wm8731 1-001a: supply HPVDD not found, using dummy regulator [ 4.827074] wm8731 1-001a: supply DCVDD not found, using dummy regulator [ 4.827101] wm8731 1-001a: supply DBVDD not found, using dummy regulator [ 4.831445] audioinjector-stereo soc:sound: ASoC: driver name too long 'audioinjector-pi-soundcard' -> 'audioinjector-p' [ 4.868650] audioinjector-stereo soc:sound: successfully loaded (...)
After opening the alsa loopback, thousands per second of: [ 242.922102] snd_aloop snd_aloop.1: snd_timer_start(0,0,0) failed with -22 [ 242.922117] snd_aloop snd_aloop.1: snd_timer_stop(0,0,0) failed with -16 [ 242.922152] snd_aloop snd_aloop.1: snd_timer_start(0,0,0) failed with -22 [ 242.922157] snd_aloop snd_aloop.1: snd_timer_stop(0,0,0) failed with -16 [ 242.922178] snd_aloop snd_aloop.1: snd_timer_start(0,0,0) failed with -22 [ 242.922182] snd_aloop snd_aloop.1: snd_timer_stop(0,0,0) failed with -16 [ 242.922203] snd_aloop snd_aloop.1: snd_timer_start(0,0,0) failed with -22 [ 242.922207] snd_aloop snd_aloop.1: snd_timer_stop(0,0,0) failed with -16
Then the soundcard itself starts to misbehave (crazy sounds heard): [ 614.169627] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors [ 629.590104] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors [ 662.121953] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors [ 675.205707] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors
snd-aloop module parameters: options snd-aloop enable=1,1,1 index=1,2,3 timer_source=hw:0,0
Additional context
No response
Could you try some of the intervening releases using the hashes on this page?:
https://github.com/raspberrypi/rpi-firmware/commits/master/
(sudo rpi-update <hash>)
Yes. Will do it.
Kernel 6.6.32 works. Kernel 6.6.33 is broken with the described issue.
I've looked through the 760 commits in that range and, although I have some suspicions, there are no obvious culprits. It would help me if you could unpack the phrase "alsa loopback devices genlocked to the Wolfram 8731 sound device":
- How is this different from normal play/record activity?
- Does normal play/record activity work?
- Is this configuration the simplest that demonstrates the problem?
Without having one of these soundcards, I think we'll end up with a pull-request with sets of commits reverted until we can pinpoint the offending one.
Right. I'll do more tests, but answering the questions:
- The only difference is that the alsa loopback is being driven by the wm8731 (timer_source=hw:0,0 to the alsa loopback module, where hw:0,0 is the wm8731)
- It works ok until I start to see "dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors" in kernel log
- I'll do more tests and write a simple C code to demonstrate (the production software is not so small, available here: https://github.com/Rhizomatica/hermes-net/tree/main/trx_v2-userland )
I'll also do more tests in the Pi 4 to confirm it is Pi5-specific, in case it helps.
- So by "genlocked alsa loopback" you mean the system is capturing samples from the soundcard input and replaying them from the output, using the same clock for both so that there should be a one-to-one correspondence?
- Understood.
- That would be helpful, especially if the problem is shared with other soundcards.
- So "genlocked alsa loopback" I just mean that I'm using alsa loopback "genlock" feature (use another sound card as clock source) by adding ¨timer_source=hw:0,0". The DSP pipeline is a bit more complicated than just a replay of the samples. You are right about the "one-to-one" correspondence.
- Right. I'll work on it.
Btw, I think the Pi5 has enough juice to compile kernel. If I want to bisect the issue, which is the first commit after 6.6.32 you would guess to start with?
I'll prepare a branch with all the relevant commits as cherry-picks, then you'll be able to use git bisect.
thanks!
If you git fetch now you should find you have origin/wm8731test.
$ git checkout origin/wm8731test
$ git bisect start
$ git bisect good 5882bc5db17dd
$ git bisect bad HEAD
Bisecting: 376 revisions left to test after this (roughly 9 steps)
[bc095ce83c5d4d44392c5f497f73fdfab2d2053d] RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters
Build and test each kernel, and after each test run either git bisect good or git bisect bad, and with a bit of luck (for no broken builds) it will converge on the culprit.
Thanks @pelwell!!
I'm still working on it. Please hang on. : )
I'm not going anywhere (this week)
I already found a commit that the software works, but I still get...
[ 2773.338718] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors
now at [f00c1597f4d6a1930de49beb68e72ac78eefce54] nvme: cancel pending I/O if nvme controller is in terminal state
That's good progress. I hope you don't have many steps to go.
It's been fun to learn "git bisect". : )
Just in case you're being unnecessarily cautious, the kernel build system is good at managing dependencies. It's very rare that I do a make clean, even when switching between architectures - it just seems to work - and incremental builds are usually much quicker than starting from scratch.
Thanks @pelwell! I'm being cautios with the testing, and at different commits, I start to get different errors, but more hard to get, like "dma2chan0 is non-idle" followed by "BUG: dma2chan0, IRQ with no descriptors". My fear is that I'll arrive to the "first" commit of testing with "bisect bad". Then I'll have to do it all again, but change the criteria - accept weird errors and just consider the software working or not.
Btw, I think it is unrelated, but in production I use the 4k per page kernel (because of some ia32 emulation which does not work with 16k pagesize), but now I see the defconfig creates me the 16k per page version. Do you know by heart the option that changes the page size?
It sounds like you may end up with multiple culprit commits that introduce different behaviours - errors or failures. As soon as you find any single commit that introduces a change, let me know.
The only difference between arm64/bcm2712_defconfig and arm64/bcm2711_defconfig is the page size. If you want to build for Pi 5 using 4kB pages (which is my standard configuration) just use bcm2711_defconfig.
Got a good commit. Yay!!
Thanks. Will start using bcm2711_defconfig then. : )
Uff, no deal. Try to set bad when I got errors in dmesg was fail for me... I think I'll investigate with some older kernels. Is there a list with pre-compiled kernels I can install with rpi-update?
At the same time I'll restart the git bisect to try to pinpoint the commit which broke the software (and I'll ignore scary messages from kernel).
btw, I ended up bumping in this commit... which seems unrelated:
1ffad06480711a5bdcaff95bc3b22a0b4b69a772 is the first bad commit
commit 1ffad06480711a5bdcaff95bc3b22a0b4b69a772
Author: Doug Berger <[email protected]>
Date: Wed Apr 24 15:25:59 2024 -0700
serial: 8250_bcm7271: use default_mux_rate if possible
The rpi-firmware repo I mentioned previously (https://github.com/raspberrypi/rpi-firmware/commits/master) is the only readily-accessible, long-lived archive of kernel builds.
Yay! Found the culprit: 6d6ff6bf5fd00939f19e8f4f63184998b50f4f56 ALSA: timer: Set lower bound of start tick time
Somehow git bisect said this was the wrong one... [127e8c9044f0efad667e868f18d4a55bb90e8c22] ALSA: seq: ump: Fix swapped song position pointer data
ps: I edited this post.
I'll patch 6.6.y HEAD and see if I get things back working.
So, definitely this commit [6d6ff6bf5fd00939f19e8f4f63184998b50f4f56] broke aloop with hw_timer. Good. Software is back working.
But there are still outstanding issues introduced at some other commit leading to crazy audio distortions and errors like: dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors
I'm still finding a way to reproduce the problem in a deterministic way.
patch to 6.6.y head:
diff --git a/sound/core/timer.c b/sound/core/timer.c
index a0b515981ee9..e6e551d4a29e 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -553,16 +553,6 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
goto unlock;
}
- /* check the actual time for the start tick;
- * bail out as error if it's way too low (< 100us)
- */
- if (start) {
- if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000) {
- result = -EINVAL;
- goto unlock;
- }
- }
-
if (start)
timeri->ticks = timeri->cticks = ticks;
else if (!timeri->cticks)
Right. All good, it seems the outstanding issues are only triggered by ffmpeg. My software and other simple alsa sample code (including speaker-test) works fine.