linux icon indicating copy to clipboard operation
linux copied to clipboard

Genlocked alsa loopback problem with Wolfram wm8731 on 6.6.40

Open rafael2k opened this issue 1 year ago • 45 comments

Describe the bug

After upgrading to kernel 6.6.40 from 6.6.31 (same problem with 6.6.42), I can not use the alsa loopback devices genlocked to the Wolfram 8731 sound device.

Steps to reproduce the behaviour

Just update kernel and run an application which opens the alsa loopback devices.

Device (s)

Raspberry Pi 5, Raspberry Pi 4

System

cat /etc/rpi-issue

Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 2acf7afcba7d11500313a7b93bb55a2aae20b2d6, stage2

vcgencmd version

2024/07/30 15:25:46 Copyright (c) 2012 Broadcom version 790da7ef (release) (embedded)

uname -a

Linux gateway.hermes.radio 6.6.42-v8+ #1787 SMP PREEMPT Thu Jul 25 17:44:41 BST 2024 aarch64 GNU/Linux

Logs

Initial dmesh output. Nothing suspicious to me.

(...) [ 4.826923] wm8731 1-001a: Assuming static MCLK [ 4.826953] wm8731 1-001a: supply AVDD not found, using dummy regulator [ 4.827050] wm8731 1-001a: supply HPVDD not found, using dummy regulator [ 4.827074] wm8731 1-001a: supply DCVDD not found, using dummy regulator [ 4.827101] wm8731 1-001a: supply DBVDD not found, using dummy regulator [ 4.831445] audioinjector-stereo soc:sound: ASoC: driver name too long 'audioinjector-pi-soundcard' -> 'audioinjector-p' [ 4.868650] audioinjector-stereo soc:sound: successfully loaded (...)

After opening the alsa loopback, thousands per second of: [ 242.922102] snd_aloop snd_aloop.1: snd_timer_start(0,0,0) failed with -22 [ 242.922117] snd_aloop snd_aloop.1: snd_timer_stop(0,0,0) failed with -16 [ 242.922152] snd_aloop snd_aloop.1: snd_timer_start(0,0,0) failed with -22 [ 242.922157] snd_aloop snd_aloop.1: snd_timer_stop(0,0,0) failed with -16 [ 242.922178] snd_aloop snd_aloop.1: snd_timer_start(0,0,0) failed with -22 [ 242.922182] snd_aloop snd_aloop.1: snd_timer_stop(0,0,0) failed with -16 [ 242.922203] snd_aloop snd_aloop.1: snd_timer_start(0,0,0) failed with -22 [ 242.922207] snd_aloop snd_aloop.1: snd_timer_stop(0,0,0) failed with -16

Then the soundcard itself starts to misbehave (crazy sounds heard): [ 614.169627] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors [ 629.590104] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors [ 662.121953] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors [ 675.205707] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors

snd-aloop module parameters: options snd-aloop enable=1,1,1 index=1,2,3 timer_source=hw:0,0

Additional context

No response

rafael2k avatar Aug 01 '24 20:08 rafael2k

Could you try some of the intervening releases using the hashes on this page?: https://github.com/raspberrypi/rpi-firmware/commits/master/ (sudo rpi-update <hash>)

pelwell avatar Aug 01 '24 21:08 pelwell

Yes. Will do it.

rafael2k avatar Aug 01 '24 23:08 rafael2k

Kernel 6.6.32 works. Kernel 6.6.33 is broken with the described issue.

rafael2k avatar Aug 01 '24 23:08 rafael2k

I've looked through the 760 commits in that range and, although I have some suspicions, there are no obvious culprits. It would help me if you could unpack the phrase "alsa loopback devices genlocked to the Wolfram 8731 sound device":

  1. How is this different from normal play/record activity?
  2. Does normal play/record activity work?
  3. Is this configuration the simplest that demonstrates the problem?

Without having one of these soundcards, I think we'll end up with a pull-request with sets of commits reverted until we can pinpoint the offending one.

pelwell avatar Aug 02 '24 09:08 pelwell

Right. I'll do more tests, but answering the questions:

  1. The only difference is that the alsa loopback is being driven by the wm8731 (timer_source=hw:0,0 to the alsa loopback module, where hw:0,0 is the wm8731)
  2. It works ok until I start to see "dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors" in kernel log
  3. I'll do more tests and write a simple C code to demonstrate (the production software is not so small, available here: https://github.com/Rhizomatica/hermes-net/tree/main/trx_v2-userland )

I'll also do more tests in the Pi 4 to confirm it is Pi5-specific, in case it helps.

rafael2k avatar Aug 02 '24 10:08 rafael2k

  1. So by "genlocked alsa loopback" you mean the system is capturing samples from the soundcard input and replaying them from the output, using the same clock for both so that there should be a one-to-one correspondence?
  2. Understood.
  3. That would be helpful, especially if the problem is shared with other soundcards.

pelwell avatar Aug 02 '24 10:08 pelwell

  1. So "genlocked alsa loopback" I just mean that I'm using alsa loopback "genlock" feature (use another sound card as clock source) by adding ¨timer_source=hw:0,0". The DSP pipeline is a bit more complicated than just a replay of the samples. You are right about the "one-to-one" correspondence.
  2. Right. I'll work on it.

rafael2k avatar Aug 02 '24 10:08 rafael2k

Btw, I think the Pi5 has enough juice to compile kernel. If I want to bisect the issue, which is the first commit after 6.6.32 you would guess to start with?

rafael2k avatar Aug 02 '24 10:08 rafael2k

I'll prepare a branch with all the relevant commits as cherry-picks, then you'll be able to use git bisect.

pelwell avatar Aug 02 '24 10:08 pelwell

thanks!

rafael2k avatar Aug 02 '24 11:08 rafael2k

If you git fetch now you should find you have origin/wm8731test.

$ git checkout origin/wm8731test
$ git bisect start
$ git bisect good 5882bc5db17dd 
$ git bisect bad HEAD
Bisecting: 376 revisions left to test after this (roughly 9 steps)
[bc095ce83c5d4d44392c5f497f73fdfab2d2053d] RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters

Build and test each kernel, and after each test run either git bisect good or git bisect bad, and with a bit of luck (for no broken builds) it will converge on the culprit.

pelwell avatar Aug 02 '24 11:08 pelwell

Thanks @pelwell!!

rafael2k avatar Aug 02 '24 11:08 rafael2k

I'm still working on it. Please hang on. : )

rafael2k avatar Aug 05 '24 14:08 rafael2k

I'm not going anywhere (this week)

pelwell avatar Aug 05 '24 14:08 pelwell

I already found a commit that the software works, but I still get...

[ 2773.338718] dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors

now at [f00c1597f4d6a1930de49beb68e72ac78eefce54] nvme: cancel pending I/O if nvme controller is in terminal state

rafael2k avatar Aug 05 '24 19:08 rafael2k

That's good progress. I hope you don't have many steps to go.

pelwell avatar Aug 05 '24 20:08 pelwell

It's been fun to learn "git bisect". : )

rafael2k avatar Aug 05 '24 21:08 rafael2k

Just in case you're being unnecessarily cautious, the kernel build system is good at managing dependencies. It's very rare that I do a make clean, even when switching between architectures - it just seems to work - and incremental builds are usually much quicker than starting from scratch.

pelwell avatar Aug 06 '24 07:08 pelwell

Thanks @pelwell! I'm being cautios with the testing, and at different commits, I start to get different errors, but more hard to get, like "dma2chan0 is non-idle" followed by "BUG: dma2chan0, IRQ with no descriptors". My fear is that I'll arrive to the "first" commit of testing with "bisect bad". Then I'll have to do it all again, but change the criteria - accept weird errors and just consider the software working or not.

rafael2k avatar Aug 06 '24 10:08 rafael2k

Btw, I think it is unrelated, but in production I use the 4k per page kernel (because of some ia32 emulation which does not work with 16k pagesize), but now I see the defconfig creates me the 16k per page version. Do you know by heart the option that changes the page size?

rafael2k avatar Aug 06 '24 10:08 rafael2k

It sounds like you may end up with multiple culprit commits that introduce different behaviours - errors or failures. As soon as you find any single commit that introduces a change, let me know.

The only difference between arm64/bcm2712_defconfig and arm64/bcm2711_defconfig is the page size. If you want to build for Pi 5 using 4kB pages (which is my standard configuration) just use bcm2711_defconfig.

pelwell avatar Aug 06 '24 10:08 pelwell

Got a good commit. Yay!!

Thanks. Will start using bcm2711_defconfig then. : )

rafael2k avatar Aug 06 '24 10:08 rafael2k

Uff, no deal. Try to set bad when I got errors in dmesg was fail for me... I think I'll investigate with some older kernels. Is there a list with pre-compiled kernels I can install with rpi-update?

At the same time I'll restart the git bisect to try to pinpoint the commit which broke the software (and I'll ignore scary messages from kernel).

rafael2k avatar Aug 06 '24 11:08 rafael2k

btw, I ended up bumping in this commit... which seems unrelated:

1ffad06480711a5bdcaff95bc3b22a0b4b69a772 is the first bad commit
commit 1ffad06480711a5bdcaff95bc3b22a0b4b69a772
Author: Doug Berger <[email protected]>
Date:   Wed Apr 24 15:25:59 2024 -0700

    serial: 8250_bcm7271: use default_mux_rate if possible

rafael2k avatar Aug 06 '24 11:08 rafael2k

The rpi-firmware repo I mentioned previously (https://github.com/raspberrypi/rpi-firmware/commits/master) is the only readily-accessible, long-lived archive of kernel builds.

pelwell avatar Aug 06 '24 13:08 pelwell

Yay! Found the culprit: 6d6ff6bf5fd00939f19e8f4f63184998b50f4f56 ALSA: timer: Set lower bound of start tick time

Somehow git bisect said this was the wrong one... [127e8c9044f0efad667e868f18d4a55bb90e8c22] ALSA: seq: ump: Fix swapped song position pointer data

ps: I edited this post.

rafael2k avatar Aug 06 '24 23:08 rafael2k

I'll patch 6.6.y HEAD and see if I get things back working.

rafael2k avatar Aug 06 '24 23:08 rafael2k

So, definitely this commit [6d6ff6bf5fd00939f19e8f4f63184998b50f4f56] broke aloop with hw_timer. Good. Software is back working.

But there are still outstanding issues introduced at some other commit leading to crazy audio distortions and errors like: dma dma2chan0: BUG: dma2chan0, IRQ with no descriptors

I'm still finding a way to reproduce the problem in a deterministic way.

rafael2k avatar Aug 07 '24 01:08 rafael2k

patch to 6.6.y head:

diff --git a/sound/core/timer.c b/sound/core/timer.c
index a0b515981ee9..e6e551d4a29e 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -553,16 +553,6 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
                goto unlock;
        }
 
-       /* check the actual time for the start tick;
-        * bail out as error if it's way too low (< 100us)
-        */
-       if (start) {
-               if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000) {
-                       result = -EINVAL;
-                       goto unlock;
-               }
-       }
-
        if (start)
                timeri->ticks = timeri->cticks = ticks;
        else if (!timeri->cticks)

rafael2k avatar Aug 07 '24 01:08 rafael2k

Right. All good, it seems the outstanding issues are only triggered by ffmpeg. My software and other simple alsa sample code (including speaker-test) works fine.

rafael2k avatar Aug 07 '24 01:08 rafael2k