linux icon indicating copy to clipboard operation
linux copied to clipboard

ath11k kernel panic on Pi 5 with 8GB RAM, but not on 2GB (DMA/PCI-E kernel panic)

Open omerk opened this issue 1 year ago • 0 comments

Describe the bug

ath11k kernel module works as expected on a Raspberry Pi 5 board with 2 GB RAM but the same image (same boot media) fails with a DMA/memory related kernel panic on the 8GB unit.

Limiting the memory of the 8GB unit to 2GB (mem=2G in cmdline.txt) fixes the issue. Detailed logs below.

Test setup consists of two units:

Unit 1: Raspberry Pi 5, 2GB RAM, Official M.2 Hat, QCN9074 WiFi module (PCI-E) Unit 2: Raspberry Pi 5, 8GB RAM, Official M.2 Hat, QCN9074 WiFi module (PCI-E)

WiFi modules used are identical brand/model and from the same batch, boot media is shared across the two units to ensure there are no config-related issues.

Not entirely sure if this is specifically an issue with the ath11k driver, as it is seems to work on other platforms, perhaps this is a BCM2712 DMA / PCI-E restriction? Speculating, of course, thanks in advance for your assistance.

Steps to reproduce the behaviour

(On a fresh device, with no WiFi configuration)

  • Compile, install and boot up the custom kernel
  • Use nmcli to connect to a WiFi network
  • kernel panic

(With a valid WiFi configuration set up)

  • Boot device with custom kernel
  • kernel panic

Device (s)

Raspberry Pi 5

System

Kernel version:

$ git rev-parse HEAD
84ab77459e61c648299d32464127b89ca65de40a

$ uname -a
Linux raspberrypi 6.6.56-v8-16k-x+ #1 SMP PREEMPT Thu Oct 17 13:34:10 BST 2024 aarch64 GNU/Linux

.config used to compile the kernel, which is essentially the standard 2712 config with ath11k enabled, attached: kernel-config.zip

config.txt used on device:

dtoverlay=disable-wifi
dtoverlay=disable-bt

# For QCN9074
dtparam=pciex1
dtparam=pciex1_gen=3

# Force PCIe config to support 32bit DMA addresses at the expense of having to bounce buffers.
# https://github.com/raspberrypi/firmware/blob/b154632e320b87ea95c6ce8b59f96dbbe523ecf1/boot/overlays/README#L3597
dtoverlay=pcie-32bit-dma

# Compatibility features
# https://github.com/raspberrypi/firmware/blob/b154632e320b87ea95c6ce8b59f96dbbe523ecf1/boot/overlays/README#L3611
# no-mip: Use if a) more than 8 interrupt vectors are required or b) the EP requires DMA and MSI addresses to be 32bit.
dtoverlay=pciex1-compat-pi5,no-mip

# Uncomment some or all of these to enable the optional hardware interfaces
#dtparam=i2c_arm=on
#dtparam=i2s=on
#dtparam=spi=on

# Enable audio (loads snd_bcm2835)
dtparam=audio=on

# Additional overlays and parameters are documented
# /boot/firmware/overlays/README

# Automatically load overlays for detected cameras
camera_auto_detect=1

# Automatically load overlays for detected DSI displays
display_auto_detect=1

# Automatically load initramfs files, if found
auto_initramfs=1

# Enable DRM VC4 V3D driver
dtoverlay=vc4-kms-v3d
max_framebuffers=2

# Don't have the firmware create an initial video= setting in cmdline.txt.
# Use the kernel's default instead.
disable_fw_kms_setup=1

# Run in 64-bit mode
arm_64bit=1

# Disable compensation for displays with overscan
disable_overscan=1

# Run as fast as firmware / board allows
arm_boost=1

[cm4]
# Enable host mode on the 2711 built-in XHCI USB controller.
# This line should be removed if the legacy DWC2 controller is required
# (e.g. for USB device mode) or if USB support is not required.
otg_mode=1

[cm5]
dtoverlay=dwc2,dr_mode=host

Logs

Working kit (Unit with 2GB RAM)

$ cat /proc/cpuinfo | grep "Model"
Model           : Raspberry Pi 5 Model B Rev 1.0

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            2009         254        1638           5         172        1754
Swap:            199           0         199

$ vcgencmd get_mem arm && vcgencmd get_mem gpu
arm=1020M
gpu=4M

ath11k is loaded on boot:

$ dmesg | grep ath11k
[    6.801102] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[    6.801137] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    6.820708] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    6.820724] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.329153] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    7.329165] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1

WiFi networks are listed:

$ nmcli dev wifi list
IN-USE  BSSID              SSID            MODE   CHAN  RATE        SIGNAL  BAR>
        5A:09:D4:FA:34:89  BTWi-fi         Infra  40    405 Mbit/s  44      ▂▄_>
        5A:09:D4:FA:34:8A  BTWifi-X        Infra  40    405 Mbit/s  40      ▂▄_>
        4C:09:D4:FA:34:88  BTHub5-CMCS     Infra  40    405 Mbit/s  37      ▂▄_>
        EC:6C:9A:4A:61:54  BT-JWAKQR       Infra  40    540 Mbit/s  27      ▂__>
...

nmcli used to connect to WiFi network:

$ sudo nmcli dev wifi connect <ap> password <password>
Device 'wlan0' successfully activated with '7a1e9176-f639-4ccf-8b19-c656fc9a1150'.

$ ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 2c:cf:67:83:eb:b8 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c4:93:00:3a:34:a2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.194/24 brd 192.168.1.255 scope global dynamic noprefixroute wlan0
       valid_lft 86377sec preferred_lft 86377sec
    inet6 fe80::a7c7:324f:bc91:522a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

$ ping raspberrypi.com
PING raspberrypi.com (172.67.154.53) 56(84) bytes of data.
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=1 ttl=58 time=9.50 ms
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=2 ttl=58 time=12.3 ms
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=3 ttl=58 time=13.5 ms
^C
--- raspberrypi.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 9.502/11.780/13.507/1.680 ms

Works as expected, no issue to report.

Non-working kit (Unit with 8GB RAM)

$ cat /proc/cpuinfo | grep "Model"
Model           : Raspberry Pi 5 Model B Rev 1.0

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            8052         300        7691           5         168        7752
Swap:            199           0         199

$ vcgencmd get_mem arm && vcgencmd get_mem gpu
arm=1020M
gpu=4M

$ dmesg | grep ath11k
[    7.140417] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[    7.140444] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.140717] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.140728] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.590439] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    7.590449] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1

$ nmcli dev wifi list
IN-USE  BSSID              SSID                      MODE   CHAN  RATE        S>
        4C:09:D4:FA:34:88  BTHub5-CMCS               Infra  40    405 Mbit/s  3>
        5A:09:D4:FA:34:8A  BTWifi-X                  Infra  40    405 Mbit/s  3>
        5A:09:D4:FA:34:89  BTWi-fi                   Infra  40    405 Mbit/s  3>
        EC:6C:9A:4A:61:54  BT-JWAKQR                 Infra  40    540 Mbit/s  2>
        62:6C:9A:4A:61:56  EE WiFi-X                 Infra  40    540 Mbit/s  2>
...

Trying to connect to a WiFi network results in a kernel panic:

$ sudo nmcli dev wifi connect <ap> password <password>
[  123.832476] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  123.841313] Mem abort info:
[  123.844114]   ESR = 0x0000000096000145
[  123.847909]   EC = 0x25: DABT (current EL), IL = 32 bits
[  123.853243]   SET = 0, FnV = 0
[  123.856304]   EA = 0, S1PTW = 0
[  123.859452]   FSC = 0x05: level 1 translation fault
[  123.864348] Data abort info:
[  123.867234]   ISV = 0, ISS = 0x00000145, ISS2 = 0x00000000
[  123.872742]   CM = 1, WnR = 1, TnD = 0, TagAccess = 0
[  123.877811]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  123.883141] user pgtable: 16k pages, 47-bit VAs, pgdp=0000000101bcc000
[  123.889694] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  123.898432] Internal error: Oops: 0000000096000145 [#1] PREEMPT SMP
[  123.904722] Modules linked in: michael_mic qrtr_mhi binfmt_misc qrtr ath11k_pci mhi ath11k qmi_helpers spidev mac80211 vc4 snd_soc_hdmi_codec drm_display_helper libarc4 cec cfg80211 drm_dma_helper sg drm_kms_helper snd_soc_core rpivid_hevc(C) aes_ce_blk pisp_be v4l2_mem2mem aes_ce_cipher snd_compress ghash_ce videobuf2_dma_contig gf128mul snd_pcm_dmaengine libaes rfkill videobuf2_memops snd_pcm videobuf2_v4l2 sha2_ce sha256_arm64 sha1_ce videodev snd_timer raspberrypi_hwmon videobuf2_common snd mc v3d i2c_brcmstb gpio_keys spi_bcm2835 gpu_sched raspberrypi_gpiomem pwm_fan rp1_adc drm_shmem_helper nvmem_rmem uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6
[  123.967462] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C         6.6.56-v8-16k-x+ #1
[  123.976108] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  123.981960] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  123.988947] pc : dcache_inval_poc+0x28/0x58
[  123.993145] lr : arch_sync_dma_for_cpu+0x34/0x50
[  123.997776] sp : ffffc00080003c40
[  124.001095] x29: ffffc00080003c40 x28: ffff80010162c860 x27: ffffc00080003eb8
[  124.008257] x26: ffffc00080003ce4 x25: 0000000000000000 x24: 0000000000000005
[  124.015419] x23: 00000000000025f0 x22: 0000000000000040 x21: 0000000000000002
[  124.022581] x20: ffff800100fab0c0 x19: ffffffffffffffff x18: 0000000000000000
[  124.029743] x17: ffffb0017a7b8000 x16: ffffd000841375c8 x15: 00005555fa586b70
[  124.036905] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  124.044067] x11: 00000000000000cf x10: 00000000000000c8 x9 : ffffd000841376c0
[  124.051229] x8 : ffffc00080003d38 x7 : 0000000000000000 x6 : 0000000000000000
[  124.058390] x5 : 00000001040c0000 x4 : ffff800104cc6820 x3 : 000000000000003f
[  124.065552] x2 : 0000000000000040 x1 : 0000000000000000 x0 : ffffffffffffffff
[  124.072714] Call trace:
[  124.075161]  dcache_inval_poc+0x28/0x58
[  124.079006]  dma_sync_single_for_cpu+0xf8/0x128
[  124.083549]  ath11k_hal_srng_prefetch_desc+0x6c/0xa0 [ath11k]
[  124.089341]  ath11k_hal_srng_access_begin+0x44/0x58 [ath11k]
[  124.095038]  ath11k_dp_process_rx+0xd0/0x3b8 [ath11k]
[  124.100124]  ath11k_dp_service_srng+0x32c/0x360 [ath11k]
[  124.105471]  ath11k_pcic_ext_grp_napi_poll+0x3c/0xd8 [ath11k]
[  124.111254]  __napi_poll+0x40/0x208
[  124.114751]  net_rx_action+0x2e0/0x338
[  124.118508]  handle_softirqs+0x118/0x360
[  124.122440]  __do_softirq+0x1c/0x28
[  124.125935]  ____do_softirq+0x18/0x30
[  124.129605]  call_on_irq_stack+0x24/0x58
[  124.133536]  do_softirq_own_stack+0x24/0x38
[  124.137730]  irq_exit_rcu+0x8c/0xd0
[  124.141225]  el1_interrupt+0x38/0x68
[  124.144810]  el1h_64_irq_handler+0x18/0x28
[  124.148917]  el1h_64_irq+0x64/0x68
[  124.152325]  default_idle_call+0x5c/0x170
[  124.156344]  do_idle+0x204/0x238
[  124.159579]  cpu_startup_entry+0x40/0x50
[  124.163512]  rest_init+0xec/0xf8
[  124.166745]  arch_call_rest_init+0x18/0x20
[  124.170853]  start_kernel+0x528/0x690
[  124.174523]  __primary_switched+0xbc/0xd0
[  124.178544] Code: d1000443 ea03003f 8a230021 54000040 (d50b7e21)
[  124.184658] ---[ end trace 0000000000000000 ]---
[  124.189287] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  124.196186] SMP: stopping secondary CPUs
[  124.200118] Kernel Offset: 0x100004000000 from 0xffffc00080000000
[  124.206231] PHYS_OFFSET: 0x0
[  124.209114] CPU features: 0x1,00000001,70028143,0000720b
[  124.214442] Memory Limit: none
[  124.217501] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Non-working 8GB unit made to work with mem=2G in cmdline.txt

$ cat /boot/firmware/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=8c5b1cb2-02 rootfstype=ext4 fsck.repair=yes mem=2G rootwait

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            1947         250        1582           5         171        1697
Swap:            199           0         199

$ dmesg | grep ath11k
[    7.862557] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[    7.862603] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.863780] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.863795] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    8.310542] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    8.310551] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1

$ nmcli dev wifi list
IN-USE  BSSID              SSID            MODE   CHAN  RATE        SIGNAL  BAR>
        EC:6C:9A:4A:61:54  BT-JWAKQR       Infra  40    540 Mbit/s  29      ▂__>
        62:6C:9A:4A:61:56  EE WiFi-X       Infra  40    540 Mbit/s  25      ▂__>
        62:6C:9A:4A:61:55  EE WiFi         Infra  40    540 Mbit/s  25      ▂__>
<...>

$ sudo nmcli dev wifi connect <ap> password <password>
Device 'wlan0' successfully activated with '6ca93d62-f17e-4580-aaa4-f1dbe64a902b'.

$ ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 2c:cf:67:67:8d:23 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c4:93:00:3a:34:99 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.185/24 brd 192.168.1.255 scope global dynamic noprefixroute wlan0
       valid_lft 86384sec preferred_lft 86384sec
    inet6 fe80::5a4e:f962:55b2:ca18/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

$ ping raspberrypi.com
PING raspberrypi.com (104.21.88.234) 56(84) bytes of data.
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=1 ttl=58 time=7.68 ms
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=2 ttl=58 time=7.94 ms
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=3 ttl=58 time=8.53 ms
^C
--- raspberrypi.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 7.677/8.050/8.531/0.356 ms

By limiting the memory to 2GB, the 8GB unit works as expected.

Additional context

I have tried various permutations of the following config options in cmdline.txt with no success:

  • iommu=soft
  • iommu.strict=1
  • coherent_pool=1M

omerk avatar Oct 17 '24 16:10 omerk