hwloc icon indicating copy to clipboard operation
hwloc copied to clipboard

Half of cores do not belog to any NUMA node

Open xemul opened this issue 4 years ago • 12 comments

What version of hwloc are you using?

2.2.0 from Fedora-33, but this as well reprodices on master. Git bisect points bc9f1ffa2889e46e7391420787276ff03f3967b8 as the first bad commit.

Which operating system and hardware are you running on?

Fedora-33, kernel 5.8.17-300.fc33.x86_64

  • Post the output of lstopo - if it works

This is what the issue is about. On 2.1.0 version the output was

Machine (126GB total) + Package L#0
  Group0 L#0
    NUMANode L#0 (P#0 63GB)
    L3 L#0 (8192KB)
      L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#24)
      L2 L#1 (512KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1
        PU L#2 (P#1)
        PU L#3 (P#25)
      L2 L#2 (512KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2
        PU L#4 (P#2)
        PU L#5 (P#26)
    L3 L#1 (8192KB)
      L2 L#3 (512KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3
        PU L#6 (P#3)
        PU L#7 (P#27)
      L2 L#4 (512KB) + L1d L#4 (32KB) + L1i L#4 (64KB) + Core L#4
        PU L#8 (P#4)
        PU L#9 (P#28)
      L2 L#5 (512KB) + L1d L#5 (32KB) + L1i L#5 (64KB) + Core L#5
        PU L#10 (P#5)
        PU L#11 (P#29)
    HostBridge
      PCIBridge
        PCI 01:00.1 (SATA)
        PCIBridge
          PCIBridge
            PCI 03:00.0 (Network)
              Net "wlp3s0"
          PCIBridge
            PCI 04:00.0 (Network)
              Net "wlp4s0"
          PCIBridge
            PCI 05:00.0 (Ethernet)
              Net "enp5s0"
          PCIBridge
            PCI 07:00.0 (Ethernet)
              Net "enp7s0"
      PCIBridge
        PCI 0a:00.2 (SATA)
  Group0 L#1
    NUMANode L#1 (P#2 63GB)
    L3 L#2 (8192KB)
      L2 L#6 (512KB) + L1d L#6 (32KB) + L1i L#6 (64KB) + Core L#6
        PU L#12 (P#6)
        PU L#13 (P#30)
      L2 L#7 (512KB) + L1d L#7 (32KB) + L1i L#7 (64KB) + Core L#7
        PU L#14 (P#7)
        PU L#15 (P#31)
      L2 L#8 (512KB) + L1d L#8 (32KB) + L1i L#8 (64KB) + Core L#8
        PU L#16 (P#8)
        PU L#17 (P#32)
    L3 L#3 (8192KB)
      L2 L#9 (512KB) + L1d L#9 (32KB) + L1i L#9 (64KB) + Core L#9
        PU L#18 (P#9)
        PU L#19 (P#33)
      L2 L#10 (512KB) + L1d L#10 (32KB) + L1i L#10 (64KB) + Core L#10
        PU L#20 (P#10)
        PU L#21 (P#34)
      L2 L#11 (512KB) + L1d L#11 (32KB) + L1i L#11 (64KB) + Core L#11
        PU L#22 (P#11)
        PU L#23 (P#35)
    HostBridge
      PCIBridge
        PCI 41:00.0 (NVMExp)
          Block(Disk) "nvme0n1"
      PCIBridge
        PCI 42:00.0 (VGA)
      PCIBridge
        PCI 44:00.2 (SATA)
  Group0 L#2
    NUMANode L#2 (P#1)
    L3 L#4 (8192KB)
      L2 L#12 (512KB) + L1d L#12 (32KB) + L1i L#12 (64KB) + Core L#12
        PU L#24 (P#12)
        PU L#25 (P#36)
      L2 L#13 (512KB) + L1d L#13 (32KB) + L1i L#13 (64KB) + Core L#13
        PU L#26 (P#13)
        PU L#27 (P#37)
      L2 L#14 (512KB) + L1d L#14 (32KB) + L1i L#14 (64KB) + Core L#14
        PU L#28 (P#14)
        PU L#29 (P#38)
    L3 L#5 (8192KB)
      L2 L#15 (512KB) + L1d L#15 (32KB) + L1i L#15 (64KB) + Core L#15
        PU L#30 (P#15)
        PU L#31 (P#39)
      L2 L#16 (512KB) + L1d L#16 (32KB) + L1i L#16 (64KB) + Core L#16
        PU L#32 (P#16)
        PU L#33 (P#40)
      L2 L#17 (512KB) + L1d L#17 (32KB) + L1i L#17 (64KB) + Core L#17
        PU L#34 (P#17)
        PU L#35 (P#41)
  Group0 L#3
    NUMANode L#3 (P#3)
    L3 L#6 (8192KB)
      L2 L#18 (512KB) + L1d L#18 (32KB) + L1i L#18 (64KB) + Core L#18
        PU L#36 (P#18)
        PU L#37 (P#42)
      L2 L#19 (512KB) + L1d L#19 (32KB) + L1i L#19 (64KB) + Core L#19
        PU L#38 (P#19)
        PU L#39 (P#43)
      L2 L#20 (512KB) + L1d L#20 (32KB) + L1i L#20 (64KB) + Core L#20
        PU L#40 (P#20)
        PU L#41 (P#44)
    L3 L#7 (8192KB)
      L2 L#21 (512KB) + L1d L#21 (32KB) + L1i L#21 (64KB) + Core L#21
        PU L#42 (P#21)
        PU L#43 (P#45)
      L2 L#22 (512KB) + L1d L#22 (32KB) + L1i L#22 (64KB) + Core L#22
        PU L#44 (P#22)
        PU L#45 (P#46)
      L2 L#23 (512KB) + L1d L#23 (32KB) + L1i L#23 (64KB) + Core L#23
        PU L#46 (P#23)
        PU L#47 (P#47)

on 2.2.0 the Group0 L#2 and L#3 are missing the NUMANode mark, like this:

Machine (126GB total) + Package L#0
  Group0 L#0
    NUMANode L#0 (P#0 63GB)
    L3 L#0 (8192KB)
      L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#24)
      L2 L#1 (512KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1
        PU L#2 (P#1)
        PU L#3 (P#25)
      L2 L#2 (512KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2
        PU L#4 (P#2)
        PU L#5 (P#26)
    L3 L#1 (8192KB)
      L2 L#3 (512KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3
        PU L#6 (P#3)
        PU L#7 (P#27)
      L2 L#4 (512KB) + L1d L#4 (32KB) + L1i L#4 (64KB) + Core L#4
        PU L#8 (P#4)
        PU L#9 (P#28)
      L2 L#5 (512KB) + L1d L#5 (32KB) + L1i L#5 (64KB) + Core L#5
        PU L#10 (P#5)
        PU L#11 (P#29)
    HostBridge
      PCIBridge
        PCI 01:00.1 (SATA)
        PCIBridge
          PCIBridge
            PCI 03:00.0 (Network)
              Net "wlp3s0"
          PCIBridge
            PCI 04:00.0 (Network)
              Net "wlp4s0"
          PCIBridge
            PCI 05:00.0 (Ethernet)
              Net "enp5s0"
          PCIBridge
            PCI 07:00.0 (Ethernet)
              Net "enp7s0"
      PCIBridge
        PCI 0a:00.2 (SATA)
  Group0 L#1
    NUMANode L#1 (P#2 63GB)
    L3 L#2 (8192KB)
      L2 L#6 (512KB) + L1d L#6 (32KB) + L1i L#6 (64KB) + Core L#6
        PU L#12 (P#6)
        PU L#13 (P#30)
      L2 L#7 (512KB) + L1d L#7 (32KB) + L1i L#7 (64KB) + Core L#7
        PU L#14 (P#7)
        PU L#15 (P#31)
      L2 L#8 (512KB) + L1d L#8 (32KB) + L1i L#8 (64KB) + Core L#8
        PU L#16 (P#8)
        PU L#17 (P#32)
    L3 L#3 (8192KB)
      L2 L#9 (512KB) + L1d L#9 (32KB) + L1i L#9 (64KB) + Core L#9
        PU L#18 (P#9)
        PU L#19 (P#33)
      L2 L#10 (512KB) + L1d L#10 (32KB) + L1i L#10 (64KB) + Core L#10
        PU L#20 (P#10)
        PU L#21 (P#34)
      L2 L#11 (512KB) + L1d L#11 (32KB) + L1i L#11 (64KB) + Core L#11
        PU L#22 (P#11)
        PU L#23 (P#35)
    HostBridge
      PCIBridge
        PCI 41:00.0 (NVMExp)
          Block(Disk) "nvme0n1"
      PCIBridge
        PCI 42:00.0 (VGA)
      PCIBridge
        PCI 44:00.2 (SATA)
  Group0 L#2
    L3 L#4 (8192KB)
      L2 L#12 (512KB) + L1d L#12 (32KB) + L1i L#12 (64KB) + Core L#12
        PU L#24 (P#12)
        PU L#25 (P#36)
      L2 L#13 (512KB) + L1d L#13 (32KB) + L1i L#13 (64KB) + Core L#13
        PU L#26 (P#13)
        PU L#27 (P#37)
      L2 L#14 (512KB) + L1d L#14 (32KB) + L1i L#14 (64KB) + Core L#14
        PU L#28 (P#14)
        PU L#29 (P#38)
    L3 L#5 (8192KB)
      L2 L#15 (512KB) + L1d L#15 (32KB) + L1i L#15 (64KB) + Core L#15
        PU L#30 (P#15)
        PU L#31 (P#39)
      L2 L#16 (512KB) + L1d L#16 (32KB) + L1i L#16 (64KB) + Core L#16
        PU L#32 (P#16)
        PU L#33 (P#40)
      L2 L#17 (512KB) + L1d L#17 (32KB) + L1i L#17 (64KB) + Core L#17
        PU L#34 (P#17)
        PU L#35 (P#41)
  Group0 L#3
    L3 L#6 (8192KB)
      L2 L#18 (512KB) + L1d L#18 (32KB) + L1i L#18 (64KB) + Core L#18
        PU L#36 (P#18)
        PU L#37 (P#42)
      L2 L#19 (512KB) + L1d L#19 (32KB) + L1i L#19 (64KB) + Core L#19
        PU L#38 (P#19)
        PU L#39 (P#43)
      L2 L#20 (512KB) + L1d L#20 (32KB) + L1i L#20 (64KB) + Core L#20
        PU L#40 (P#20)
        PU L#41 (P#44)
    L3 L#7 (8192KB)
      L2 L#21 (512KB) + L1d L#21 (32KB) + L1i L#21 (64KB) + Core L#21
        PU L#42 (P#21)
        PU L#43 (P#45)
      L2 L#22 (512KB) + L1d L#22 (32KB) + L1i L#22 (64KB) + Core L#22
        PU L#44 (P#22)
        PU L#45 (P#46)
      L2 L#23 (512KB) + L1d L#23 (32KB) + L1i L#23 (64KB) + Core L#23
        PU L#46 (P#23)
        PU L#47 (P#47)

Details of the problem

  • What happened?

lstopo-no-graphics is missing 2 NUMA nodes and thus doesn't assign half of the cores to any NUMA node.

  • How did you start your process?

hwloc-ls from feddora, which is a symlink on lstopo-no-graphics

  • How did it fail? Crash? Unexpected result?

Unexpected result, as described above.

Additional information

If your issue consists in a wrong topology detection, we also need the following for debugging remotely:

  • On Linux, run hwloc-gather-topology myhost and post the myhost.* files that it will generate. Note that this tool may be slow on large nodes or when I/O is enabled.

myhost.output.gz myhost.xml.gz myhost.tar.bz2.gz (sorry about .bz2.gz, github doesn't attach .bz2-s)

xemul avatar Nov 03 '20 19:11 xemul

Hello The NUMA nodes that disappeared in 2.2 have 0kB memory available to this process. Since hwloc 2.0, NUMA nodes are really meant to describe the available memory, that's why they don't show up anymore. I guess this memory config was setup with cgroups v2, hence why 2.1 got confused about it (it reported NUMA nodes with 0kB instead of no NUMA nodes). Brice

bgoglin avatar Nov 03 '20 19:11 bgoglin

OK, thanks for the clarification. Probably a stupid question then, but still -- does it mean that PUs 24 through 47 really do not have any local NUMA nodes and treat both existing ones as "remote"?

xemul avatar Nov 03 '20 20:11 xemul

Yes. This cgroup config is sort of strange because of this. Is this something auto-configured by your Fedora ? Or by the administrator?

bgoglin avatar Nov 03 '20 23:11 bgoglin

By Fedora, because administrator is me and I didn't do any configuration in this place.

xemul avatar Nov 04 '20 06:11 xemul

The other possibility is that your hardware doesn't have any memory in these nodes. How many memory DIMMs do you have installed? Your processor has 4 memory channels, one for each NUMA node basically. Maybe you have only 2 DIMMs, or maybe your DIMMs are on only 2 channels? Something like this will tell you how many DIMMs you have

sudo dmidecode --type 17 | grep "Form Factor: DIMM"

(type 17 is Memory Device, but it returns empty slots too. Form Factor is unknown when the slot is empty)

bgoglin avatar Nov 04 '20 06:11 bgoglin

$ sudo dmidecode --type 17 | grep "Form Factor: DIMM"
	Form Factor: DIMM
	Form Factor: DIMM
	Form Factor: DIMM
	Form Factor: DIMM
	Form Factor: DIMM
	Form Factor: DIMM
	Form Factor: DIMM
	Form Factor: DIMM

xemul avatar Nov 04 '20 06:11 xemul

$ sudo dmidecode --type 17 | grep 'Locator'
	Locator: DIMM 0
	Bank Locator: P0 CHANNEL A
	Locator: DIMM 1
	Bank Locator: P0 CHANNEL A
	Locator: DIMM 0
	Bank Locator: P0 CHANNEL B
	Locator: DIMM 1
	Bank Locator: P0 CHANNEL B
	Locator: DIMM 0
	Bank Locator: P0 CHANNEL C
	Locator: DIMM 1
	Bank Locator: P0 CHANNEL C
	Locator: DIMM 0
	Bank Locator: P0 CHANNEL D
	Locator: DIMM 1
	Bank Locator: P0 CHANNEL D

xemul avatar Nov 04 '20 06:11 xemul

Can you grep for Size too?

bgoglin avatar Nov 04 '20 07:11 bgoglin

$ sudo dmidecode --type 17 | grep 'Size'
	Size: 16 GB
	Size: 16 GB
	Size: 16 GB
	Size: 16 GB
	Size: 16 GB
	Size: 16 GB
	Size: 16 GB
	Size: 16 GB

xemul avatar Nov 04 '20 07:11 xemul

Ok, all memory is enabled then, nothing is disabled by cgroups. Maybe these processors have channels connected to only half of their internal nodes.

bgoglin avatar Nov 04 '20 07:11 bgoglin

So it's a hardware configuration, right? This explanation suits me even if it cannot be fixed :)

xemul avatar Nov 04 '20 07:11 xemul

Possibly, I seem to remember I've seen that once in the past on AMD Ryzen. I asked AMD, I'll let you know their answer.

bgoglin avatar Nov 04 '20 08:11 bgoglin

I am closing this since the mystery was resolved (missing DIMMs on 2 quarters of the CPU) and mentioned in other issues.

bgoglin avatar Mar 08 '23 08:03 bgoglin