t2sde icon indicating copy to clipboard operation
t2sde copied to clipboard

udevd NaT consumption and warnings on IA-64

Open lenticularis39 opened this issue 3 years ago • 3 comments

While booting T2 on Itanium, udevd prints out a warning like this for different modules (the first one is for the SCSI module, but it also happens for Ethernet (e1000) and USB (stack shortened):

T2 SDE early useudevd[45]: starting version 182
rspace (c)2005-2021 Rene Rebe, ExactCODE GmbH; Germany.
Mounting /dev, /proc and /sys
Linux 5.14.6-t2, populating u/dev
------------[ cut here ]------------
WARNING: CPU: 0 PID: 54 at fs/proc/generic.c:406 __proc_create+0x5a0/0x600
name len 0
Modules linked in: scsi_mod(+) usb_common
CPU: 0 PID: 54 Comm: udevd Not tainted 5.14.6-t2 #1
Hardware name: hp server rx2620                   , BIOS 03.17                  
                                          03/31/2005                            

Call Trace:
 [<a000000100015070>] show_stack+0x90/0xc0
                                sp=e000000100bbfab0 bsp=e000000100bb9638
...
 [<a0000002001f8400>] scsi_init_procfs+0x80/0x480 [scsi_mod]
                                sp=e000000100bbfce0 bsp=e000000100bb93e0
 [<a0000002001f8020>] init_scsi+0x20/0x1a0 [scsi_mod]
                                sp=e000000100bbfce0 bsp=e000000100bb93c0
...
 [<a00000010000c860>] ia64_ret_from_syscall+0x0/0x20
                                sp=e000000100bbfe30 bsp=e000000100bb9100
 [<a000000000040720>] ia64_ivt+0xffffffff00040720/0x400
                                sp=e000000100bc0000 bsp=e000000100bb9100
---[ end trace 0a4e30188a2ec52a ]---
------------[ cut here ]------------

There are also some NaT consumption warnings mixed in there:

udevd[48]: NaT consumption 17179869216 [2]

Along with worker terminated messages:

udevd[45]: seq 954 '/devices/pci0000:00/0000:00:01.0' killed
udevd[45]: worker [50] timeout, kill it
udevd[45]: seq 955 '/devices/pci0000:00/0000:00:01.1' killed
udevd[45]: worker [67] timeout, kill it
udevd[45]: seq 961 '/devices/pci0000:20/0000:20:02.0' killed
udevd[45]: worker [49] terminated by signal 9 (Killed)
udevd[45]: worker [50] terminated by signal 9 (Killed)
udevd[45]: worker [67] terminated by signal 9 (Killed)

It's not easy to exactly determine how are those related, since they are printed over each other, but they are clearly regressions of udev, the Linux kernel or both.

lenticularis39 avatar Sep 24 '21 16:09 lenticularis39

~None of these messages appear on a 4.4.285 kernel, so I assume there is a kernel regression that will have to be bisected.~ Edit: I accidentally compiled the kernel without the modules

lenticularis39 avatar Sep 30 '21 15:09 lenticularis39

or it is a gcc / binutils toolchain regression, ...

rxrbln avatar Sep 30 '21 15:09 rxrbln

Some further findings:

  • the error only happens to dynamically linked modules; when built statically into the kernel, they work properly
  • the actual error appears to be an invalid memory access, the NaT consumption being only a side effect
  • a similar problem appears with Python 3.10: its list allocalor causes a segfault by accessing invalid memory after pointer arithmetics

lenticularis39 avatar Nov 21 '21 17:11 lenticularis39