t2sde
t2sde copied to clipboard
udevd NaT consumption and warnings on IA-64
While booting T2 on Itanium, udevd prints out a warning like this for different modules (the first one is for the SCSI module, but it also happens for Ethernet (e1000) and USB (stack shortened):
T2 SDE early useudevd[45]: starting version 182
rspace (c)2005-2021 Rene Rebe, ExactCODE GmbH; Germany.
Mounting /dev, /proc and /sys
Linux 5.14.6-t2, populating u/dev
------------[ cut here ]------------
WARNING: CPU: 0 PID: 54 at fs/proc/generic.c:406 __proc_create+0x5a0/0x600
name len 0
Modules linked in: scsi_mod(+) usb_common
CPU: 0 PID: 54 Comm: udevd Not tainted 5.14.6-t2 #1
Hardware name: hp server rx2620 , BIOS 03.17
03/31/2005
Call Trace:
[<a000000100015070>] show_stack+0x90/0xc0
sp=e000000100bbfab0 bsp=e000000100bb9638
...
[<a0000002001f8400>] scsi_init_procfs+0x80/0x480 [scsi_mod]
sp=e000000100bbfce0 bsp=e000000100bb93e0
[<a0000002001f8020>] init_scsi+0x20/0x1a0 [scsi_mod]
sp=e000000100bbfce0 bsp=e000000100bb93c0
...
[<a00000010000c860>] ia64_ret_from_syscall+0x0/0x20
sp=e000000100bbfe30 bsp=e000000100bb9100
[<a000000000040720>] ia64_ivt+0xffffffff00040720/0x400
sp=e000000100bc0000 bsp=e000000100bb9100
---[ end trace 0a4e30188a2ec52a ]---
------------[ cut here ]------------
There are also some NaT consumption warnings mixed in there:
udevd[48]: NaT consumption 17179869216 [2]
Along with worker terminated messages:
udevd[45]: seq 954 '/devices/pci0000:00/0000:00:01.0' killed
udevd[45]: worker [50] timeout, kill it
udevd[45]: seq 955 '/devices/pci0000:00/0000:00:01.1' killed
udevd[45]: worker [67] timeout, kill it
udevd[45]: seq 961 '/devices/pci0000:20/0000:20:02.0' killed
udevd[45]: worker [49] terminated by signal 9 (Killed)
udevd[45]: worker [50] terminated by signal 9 (Killed)
udevd[45]: worker [67] terminated by signal 9 (Killed)
It's not easy to exactly determine how are those related, since they are printed over each other, but they are clearly regressions of udev, the Linux kernel or both.
~None of these messages appear on a 4.4.285 kernel, so I assume there is a kernel regression that will have to be bisected.~ Edit: I accidentally compiled the kernel without the modules
or it is a gcc / binutils toolchain regression, ...
Some further findings:
- the error only happens to dynamically linked modules; when built statically into the kernel, they work properly
- the actual error appears to be an invalid memory access, the NaT consumption being only a side effect
- a similar problem appears with Python 3.10: its list allocalor causes a segfault by accessing invalid memory after pointer arithmetics