ath11k kernel panic on Pi 5 with 8GB RAM, but not on 2GB (DMA/PCI-E kernel panic)
Describe the bug
ath11k kernel module works as expected on a Raspberry Pi 5 board with 2 GB RAM but the same image (same boot media) fails with a DMA/memory related kernel panic on the 8GB unit.
Limiting the memory of the 8GB unit to 2GB (mem=2G in cmdline.txt) fixes the issue. Detailed logs below.
Test setup consists of two units:
Unit 1: Raspberry Pi 5, 2GB RAM, Official M.2 Hat, QCN9074 WiFi module (PCI-E) Unit 2: Raspberry Pi 5, 8GB RAM, Official M.2 Hat, QCN9074 WiFi module (PCI-E)
WiFi modules used are identical brand/model and from the same batch, boot media is shared across the two units to ensure there are no config-related issues.
Not entirely sure if this is specifically an issue with the ath11k driver, as it is seems to work on other platforms, perhaps this is a BCM2712 DMA / PCI-E restriction? Speculating, of course, thanks in advance for your assistance.
Steps to reproduce the behaviour
(On a fresh device, with no WiFi configuration)
- Compile, install and boot up the custom kernel
- Use
nmclito connect to a WiFi network - kernel panic
(With a valid WiFi configuration set up)
- Boot device with custom kernel
- kernel panic
Device (s)
Raspberry Pi 5
System
Kernel version:
$ git rev-parse HEAD
84ab77459e61c648299d32464127b89ca65de40a
$ uname -a
Linux raspberrypi 6.6.56-v8-16k-x+ #1 SMP PREEMPT Thu Oct 17 13:34:10 BST 2024 aarch64 GNU/Linux
.config used to compile the kernel, which is essentially the standard 2712 config with ath11k enabled, attached: kernel-config.zip
config.txt used on device:
dtoverlay=disable-wifi
dtoverlay=disable-bt
# For QCN9074
dtparam=pciex1
dtparam=pciex1_gen=3
# Force PCIe config to support 32bit DMA addresses at the expense of having to bounce buffers.
# https://github.com/raspberrypi/firmware/blob/b154632e320b87ea95c6ce8b59f96dbbe523ecf1/boot/overlays/README#L3597
dtoverlay=pcie-32bit-dma
# Compatibility features
# https://github.com/raspberrypi/firmware/blob/b154632e320b87ea95c6ce8b59f96dbbe523ecf1/boot/overlays/README#L3611
# no-mip: Use if a) more than 8 interrupt vectors are required or b) the EP requires DMA and MSI addresses to be 32bit.
dtoverlay=pciex1-compat-pi5,no-mip
# Uncomment some or all of these to enable the optional hardware interfaces
#dtparam=i2c_arm=on
#dtparam=i2s=on
#dtparam=spi=on
# Enable audio (loads snd_bcm2835)
dtparam=audio=on
# Additional overlays and parameters are documented
# /boot/firmware/overlays/README
# Automatically load overlays for detected cameras
camera_auto_detect=1
# Automatically load overlays for detected DSI displays
display_auto_detect=1
# Automatically load initramfs files, if found
auto_initramfs=1
# Enable DRM VC4 V3D driver
dtoverlay=vc4-kms-v3d
max_framebuffers=2
# Don't have the firmware create an initial video= setting in cmdline.txt.
# Use the kernel's default instead.
disable_fw_kms_setup=1
# Run in 64-bit mode
arm_64bit=1
# Disable compensation for displays with overscan
disable_overscan=1
# Run as fast as firmware / board allows
arm_boost=1
[cm4]
# Enable host mode on the 2711 built-in XHCI USB controller.
# This line should be removed if the legacy DWC2 controller is required
# (e.g. for USB device mode) or if USB support is not required.
otg_mode=1
[cm5]
dtoverlay=dwc2,dr_mode=host
Logs
Working kit (Unit with 2GB RAM)
$ cat /proc/cpuinfo | grep "Model"
Model : Raspberry Pi 5 Model B Rev 1.0
$ free -m
total used free shared buff/cache available
Mem: 2009 254 1638 5 172 1754
Swap: 199 0 199
$ vcgencmd get_mem arm && vcgencmd get_mem gpu
arm=1020M
gpu=4M
ath11k is loaded on boot:
$ dmesg | grep ath11k
[ 6.801102] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[ 6.801137] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[ 6.820708] ath11k_pci 0000:01:00.0: MSI vectors: 16
[ 6.820724] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[ 7.329153] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[ 7.329165] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1
WiFi networks are listed:
$ nmcli dev wifi list
IN-USE BSSID SSID MODE CHAN RATE SIGNAL BAR>
5A:09:D4:FA:34:89 BTWi-fi Infra 40 405 Mbit/s 44 ▂▄_>
5A:09:D4:FA:34:8A BTWifi-X Infra 40 405 Mbit/s 40 ▂▄_>
4C:09:D4:FA:34:88 BTHub5-CMCS Infra 40 405 Mbit/s 37 ▂▄_>
EC:6C:9A:4A:61:54 BT-JWAKQR Infra 40 540 Mbit/s 27 ▂__>
...
nmcli used to connect to WiFi network:
$ sudo nmcli dev wifi connect <ap> password <password>
Device 'wlan0' successfully activated with '7a1e9176-f639-4ccf-8b19-c656fc9a1150'.
$ ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
link/ether 2c:cf:67:83:eb:b8 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether c4:93:00:3a:34:a2 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.194/24 brd 192.168.1.255 scope global dynamic noprefixroute wlan0
valid_lft 86377sec preferred_lft 86377sec
inet6 fe80::a7c7:324f:bc91:522a/64 scope link noprefixroute
valid_lft forever preferred_lft forever
$ ping raspberrypi.com
PING raspberrypi.com (172.67.154.53) 56(84) bytes of data.
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=1 ttl=58 time=9.50 ms
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=2 ttl=58 time=12.3 ms
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=3 ttl=58 time=13.5 ms
^C
--- raspberrypi.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 9.502/11.780/13.507/1.680 ms
Works as expected, no issue to report.
Non-working kit (Unit with 8GB RAM)
$ cat /proc/cpuinfo | grep "Model"
Model : Raspberry Pi 5 Model B Rev 1.0
$ free -m
total used free shared buff/cache available
Mem: 8052 300 7691 5 168 7752
Swap: 199 0 199
$ vcgencmd get_mem arm && vcgencmd get_mem gpu
arm=1020M
gpu=4M
$ dmesg | grep ath11k
[ 7.140417] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[ 7.140444] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[ 7.140717] ath11k_pci 0000:01:00.0: MSI vectors: 16
[ 7.140728] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[ 7.590439] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[ 7.590449] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1
$ nmcli dev wifi list
IN-USE BSSID SSID MODE CHAN RATE S>
4C:09:D4:FA:34:88 BTHub5-CMCS Infra 40 405 Mbit/s 3>
5A:09:D4:FA:34:8A BTWifi-X Infra 40 405 Mbit/s 3>
5A:09:D4:FA:34:89 BTWi-fi Infra 40 405 Mbit/s 3>
EC:6C:9A:4A:61:54 BT-JWAKQR Infra 40 540 Mbit/s 2>
62:6C:9A:4A:61:56 EE WiFi-X Infra 40 540 Mbit/s 2>
...
Trying to connect to a WiFi network results in a kernel panic:
$ sudo nmcli dev wifi connect <ap> password <password>
[ 123.832476] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 123.841313] Mem abort info:
[ 123.844114] ESR = 0x0000000096000145
[ 123.847909] EC = 0x25: DABT (current EL), IL = 32 bits
[ 123.853243] SET = 0, FnV = 0
[ 123.856304] EA = 0, S1PTW = 0
[ 123.859452] FSC = 0x05: level 1 translation fault
[ 123.864348] Data abort info:
[ 123.867234] ISV = 0, ISS = 0x00000145, ISS2 = 0x00000000
[ 123.872742] CM = 1, WnR = 1, TnD = 0, TagAccess = 0
[ 123.877811] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 123.883141] user pgtable: 16k pages, 47-bit VAs, pgdp=0000000101bcc000
[ 123.889694] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[ 123.898432] Internal error: Oops: 0000000096000145 [#1] PREEMPT SMP
[ 123.904722] Modules linked in: michael_mic qrtr_mhi binfmt_misc qrtr ath11k_pci mhi ath11k qmi_helpers spidev mac80211 vc4 snd_soc_hdmi_codec drm_display_helper libarc4 cec cfg80211 drm_dma_helper sg drm_kms_helper snd_soc_core rpivid_hevc(C) aes_ce_blk pisp_be v4l2_mem2mem aes_ce_cipher snd_compress ghash_ce videobuf2_dma_contig gf128mul snd_pcm_dmaengine libaes rfkill videobuf2_memops snd_pcm videobuf2_v4l2 sha2_ce sha256_arm64 sha1_ce videodev snd_timer raspberrypi_hwmon videobuf2_common snd mc v3d i2c_brcmstb gpio_keys spi_bcm2835 gpu_sched raspberrypi_gpiomem pwm_fan rp1_adc drm_shmem_helper nvmem_rmem uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6
[ 123.967462] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C 6.6.56-v8-16k-x+ #1
[ 123.976108] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[ 123.981960] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 123.988947] pc : dcache_inval_poc+0x28/0x58
[ 123.993145] lr : arch_sync_dma_for_cpu+0x34/0x50
[ 123.997776] sp : ffffc00080003c40
[ 124.001095] x29: ffffc00080003c40 x28: ffff80010162c860 x27: ffffc00080003eb8
[ 124.008257] x26: ffffc00080003ce4 x25: 0000000000000000 x24: 0000000000000005
[ 124.015419] x23: 00000000000025f0 x22: 0000000000000040 x21: 0000000000000002
[ 124.022581] x20: ffff800100fab0c0 x19: ffffffffffffffff x18: 0000000000000000
[ 124.029743] x17: ffffb0017a7b8000 x16: ffffd000841375c8 x15: 00005555fa586b70
[ 124.036905] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 124.044067] x11: 00000000000000cf x10: 00000000000000c8 x9 : ffffd000841376c0
[ 124.051229] x8 : ffffc00080003d38 x7 : 0000000000000000 x6 : 0000000000000000
[ 124.058390] x5 : 00000001040c0000 x4 : ffff800104cc6820 x3 : 000000000000003f
[ 124.065552] x2 : 0000000000000040 x1 : 0000000000000000 x0 : ffffffffffffffff
[ 124.072714] Call trace:
[ 124.075161] dcache_inval_poc+0x28/0x58
[ 124.079006] dma_sync_single_for_cpu+0xf8/0x128
[ 124.083549] ath11k_hal_srng_prefetch_desc+0x6c/0xa0 [ath11k]
[ 124.089341] ath11k_hal_srng_access_begin+0x44/0x58 [ath11k]
[ 124.095038] ath11k_dp_process_rx+0xd0/0x3b8 [ath11k]
[ 124.100124] ath11k_dp_service_srng+0x32c/0x360 [ath11k]
[ 124.105471] ath11k_pcic_ext_grp_napi_poll+0x3c/0xd8 [ath11k]
[ 124.111254] __napi_poll+0x40/0x208
[ 124.114751] net_rx_action+0x2e0/0x338
[ 124.118508] handle_softirqs+0x118/0x360
[ 124.122440] __do_softirq+0x1c/0x28
[ 124.125935] ____do_softirq+0x18/0x30
[ 124.129605] call_on_irq_stack+0x24/0x58
[ 124.133536] do_softirq_own_stack+0x24/0x38
[ 124.137730] irq_exit_rcu+0x8c/0xd0
[ 124.141225] el1_interrupt+0x38/0x68
[ 124.144810] el1h_64_irq_handler+0x18/0x28
[ 124.148917] el1h_64_irq+0x64/0x68
[ 124.152325] default_idle_call+0x5c/0x170
[ 124.156344] do_idle+0x204/0x238
[ 124.159579] cpu_startup_entry+0x40/0x50
[ 124.163512] rest_init+0xec/0xf8
[ 124.166745] arch_call_rest_init+0x18/0x20
[ 124.170853] start_kernel+0x528/0x690
[ 124.174523] __primary_switched+0xbc/0xd0
[ 124.178544] Code: d1000443 ea03003f 8a230021 54000040 (d50b7e21)
[ 124.184658] ---[ end trace 0000000000000000 ]---
[ 124.189287] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[ 124.196186] SMP: stopping secondary CPUs
[ 124.200118] Kernel Offset: 0x100004000000 from 0xffffc00080000000
[ 124.206231] PHYS_OFFSET: 0x0
[ 124.209114] CPU features: 0x1,00000001,70028143,0000720b
[ 124.214442] Memory Limit: none
[ 124.217501] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
Non-working 8GB unit made to work with mem=2G in cmdline.txt
$ cat /boot/firmware/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=8c5b1cb2-02 rootfstype=ext4 fsck.repair=yes mem=2G rootwait
$ free -m
total used free shared buff/cache available
Mem: 1947 250 1582 5 171 1697
Swap: 199 0 199
$ dmesg | grep ath11k
[ 7.862557] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[ 7.862603] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[ 7.863780] ath11k_pci 0000:01:00.0: MSI vectors: 16
[ 7.863795] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[ 8.310542] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[ 8.310551] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1
$ nmcli dev wifi list
IN-USE BSSID SSID MODE CHAN RATE SIGNAL BAR>
EC:6C:9A:4A:61:54 BT-JWAKQR Infra 40 540 Mbit/s 29 ▂__>
62:6C:9A:4A:61:56 EE WiFi-X Infra 40 540 Mbit/s 25 ▂__>
62:6C:9A:4A:61:55 EE WiFi Infra 40 540 Mbit/s 25 ▂__>
<...>
$ sudo nmcli dev wifi connect <ap> password <password>
Device 'wlan0' successfully activated with '6ca93d62-f17e-4580-aaa4-f1dbe64a902b'.
$ ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
link/ether 2c:cf:67:67:8d:23 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether c4:93:00:3a:34:99 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.185/24 brd 192.168.1.255 scope global dynamic noprefixroute wlan0
valid_lft 86384sec preferred_lft 86384sec
inet6 fe80::5a4e:f962:55b2:ca18/64 scope link noprefixroute
valid_lft forever preferred_lft forever
$ ping raspberrypi.com
PING raspberrypi.com (104.21.88.234) 56(84) bytes of data.
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=1 ttl=58 time=7.68 ms
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=2 ttl=58 time=7.94 ms
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=3 ttl=58 time=8.53 ms
^C
--- raspberrypi.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 7.677/8.050/8.531/0.356 ms
By limiting the memory to 2GB, the 8GB unit works as expected.
Additional context
I have tried various permutations of the following config options in cmdline.txt with no success:
-
iommu=soft -
iommu.strict=1 -
coherent_pool=1M