dma_ip_drivers icon indicating copy to clipboard operation
dma_ip_drivers copied to clipboard

insmod xdma.ko crash on the first ioread32() after pci_iomap()

Open mazhenke opened this issue 1 year ago • 1 comments

I got a blocking issue when running XDMA driver on my ARM platform: N1SDP.

On the U280, I generated one FPGA project with XDMA: image image image image image image

And for the XDMA driver, I build into a ko without specify config_bar_num/xvc_bar_num/xvc_bar_offset, and the XDMA driver will call: map_single_bar() --> is_config_bar(), and I got a kernal ops at the first read_register in is_config_bar: irq_id = read_register(&irq_regs->identifier);

The following is my log:

root@n1sdp:~/work# root@n1sdp:~/work# root@n1sdp:~/work# echo 8 > /proc/sys/kernel/printk root@n1sdp:~/work# root@n1sdp:~/work# cat /proc/sys/kernel/printk 8 4 1 7 root@n1sdp:~/work# root@n1sdp:~/work# insmod xdma_log_io.ko [ 36.409496] xdma: loading out-of-tree module taints kernel. [ 36.415324] xdma: module verification failed: signature and/or required key missing - tainting kernel [ 36.425053] xdma:xdma_mod_init: Xilinx XDMA Reference Driver xdma v2020.2.2 [ 36.432004] xdma:xdma_mod_init: desc_blen_max: 0xfffffff/268435455, timeout: h2c 10 c2h 10 sec. [ 36.441118] xdma 0001:01:00.0: Adding to iommu group 1 [ 36.446293] xdma:xdma_device_open: xdma device 0001:01:00.0, 0x0000000032c59170. [ 36.453689] xdma 0001:01:00.0: enabling device (0000 -> 0002) [ 36.459473] xdma:map_single_bar: BAR0 at 0x69200000 mapped at 0xffff80000ba00000, length=65536(/65536) [ 36.468769] xdma:read_register: read reg: 0xffff80000ba02000 [ 36.474420] SError Interrupt on CPU3, code 0x00000000be000411 -- SError [ 36.474423] CPU: 3 PID: 453 Comm: insmod Tainted: G OE 6.3.3+ #1 [ 36.474425] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 36.474428] pc : xdma_device_open+0xe98/0x1498 [xdma] [ 36.474439] lr : xdma_device_open+0xe90/0x1498 [xdma] [ 36.474446] sp : ffff80000b9e3590 [ 36.474447] x29: ffff80000b9e3590 x28: 0000000000000000 x27: ffff0080009733d0 [ 36.474450] x26: ffff80000165bb80 x25: ffff80000ba00000 x24: 0000000000000000 [ 36.474452] x23: 0000000000000000 x22: ffff008000973000 x21: ffff80000165a350 [ 36.474454] x20: ffff008010acb048 x19: ffff80000ba02000 x18: 0000000000000000 [ 36.474456] x17: 3061623030303038 x16: 6666666678302074 x15: 612064657070616d [ 36.474458] x14: 0000000000000000 x13: 3030303230616230 x12: 3030303866666666 [ 36.474460] x11: 7830203a67657220 x10: 64616572203a7265 x9 : ffff8000083efa8c [ 36.474462] x8 : 64616572203a7265 x7 : 205d393637383634 x6 : ffff80000a82d550 [ 36.474464] x5 : 0000000000000000 x4 : ffff00837dfbad08 x3 : ffff800001652470 [ 36.474466] x2 : ffff80000164a9c8 x1 : ffff80000ba02000 x0 : 0000000000000020 [ 36.474468] Kernel panic - not syncing: Asynchronous SError Interrupt [ 36.474470] CPU: 3 PID: 453 Comm: insmod Tainted: G OE 6.3.3+ #1 [ 36.474472] Call trace: [ 36.474473] dump_backtrace+0xac/0x138 [ 36.474477] show_stack+0x20/0x38 [ 36.474478] dump_stack_lvl+0x78/0xc8 [ 36.474482] dump_stack+0x18/0x28 [ 36.474484] panic+0x3d0/0x428 [ 36.474488] nmi_panic+0xb4/0xc0 [ 36.474490] arm64_serror_panic+0x78/0x90 [ 36.474491] do_serror+0x60/0x68 [ 36.474493] el1h_64_error_handler+0x3c/0x70 [ 36.474497] el1h_64_error+0x7c/0x80 [ 36.474499] xdma_device_open+0xe98/0x1498 [xdma] [ 36.474506] probe_one+0x98/0x2b0 [xdma] [ 36.474514] local_pci_probe+0x48/0xd0 [ 36.474518] pci_device_probe+0xb4/0x240 [ 36.474520] really_probe+0x198/0x400 [ 36.474525] __driver_probe_device+0x90/0x1b0 [ 36.474527] driver_probe_device+0x44/0x168 [ 36.474530] __driver_attach+0x104/0x250 [ 36.474532] bus_for_each_dev+0x7c/0xe8 [ 36.474536] driver_attach+0x2c/0x40 [ 36.474538] bus_add_driver+0x118/0x250 [ 36.474540] driver_register+0x68/0x138 [ 36.474542] __pci_register_driver+0x4c/0x60 [ 36.474545] xdma_mod_init+0x9c/0xb8 [xdma] [ 36.474552] do_one_initcall+0x4c/0x2e0 [ 36.474554] do_init_module+0x50/0x210 [ 36.474558] load_module+0x21f4/0x24a8 [ 36.474560] __do_sys_finit_module+0xc4/0x148 [ 36.474562] __arm64_sys_finit_module+0x28/0x40 [ 36.474564] invoke_syscall+0x78/0x108 [ 36.474566] el0_svc_common.constprop.0+0x58/0x188 [ 36.474567] do_el0_svc+0x40/0xb8 [ 36.474568] el0_svc+0x34/0x138 [ 36.474571] el0t_64_sync_handler+0xb8/0xc0 [ 36.474573] el0t_64_sync+0x1a8/0x1b0 [ 36.474575] SMP: stopping secondary CPUs [ 36.474579] Kernel Offset: 0x180000 from 0xffff800008000000 [ 36.474580] PHYS_OFFSET: 0x80000000 [ 36.474581] CPU features: 0x000000,20200506,3201720b [ 36.474582] Memory Limit: none [ 37.516082] pstore: backend (efi_pstore) writing error (-5) [ 37.797990] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

mazhenke avatar Aug 30 '23 02:08 mazhenke

I add some log in map_single_bar() and read_register():

static int map_single_bar(struct xdma_dev *xdev, struct pci_dev *dev, int idx) { ... ... pr_info("BAR%d at 0x%llx mapped at 0x%llx, length=%llu(/%llu)\n", idx, (u64)bar_start, (u64)(xdev->bar[idx]), (u64)map_len, (u64)bar_len); ... ... }

inline u32 read_register(void *iomem) { pr_info("read reg: 0x%llx\n", (u64)iomem);

return ioread32(iomem);

}

mazhenke avatar Aug 30 '23 02:08 mazhenke

Did you ever resolve this?

I had a similar kernel panic. For me the solution was to set the config_bar_number in the makefile. Without it, the driver attempts to blindly probe the bars to guess which one to use. That can hit an invalid address and panic.

rcls avatar Oct 02 '24 19:10 rcls