hwloc icon indicating copy to clipboard operation
hwloc copied to clipboard

Selecting default and several nodes with --best-memattr

Open antoine-morvan opened this issue 1 year ago • 5 comments

What version of hwloc are you using?

2.10.0

Which operating system and hardware are you running on?

RHEL 8; Linux 4.18

Details of the problem

Hello,

I am looking to allocate memory to the nodes showing best "attribute" among

$ lstopo --memattrs
Memory attribute #0 name `Capacity' flags 1
  NUMANode L#0 = 33613537280
  NUMANode L#1 = 33813360640
  NUMANode L#2 = 33813364736
  NUMANode L#3 = 33800777728
  NUMANode L#4 = 33813364736
  NUMANode L#5 = 33813360640
  NUMANode L#6 = 33767444480
  NUMANode L#7 = 33802883072
Memory attribute #1 name `Locality' flags 2
  NUMANode L#0 = 32
  NUMANode L#1 = 32
  NUMANode L#2 = 32
  NUMANode L#3 = 32
  NUMANode L#4 = 32
  NUMANode L#5 = 32
  NUMANode L#6 = 32
  NUMANode L#7 = 32
Memory attribute #2 name `Bandwidth' flags 5
Memory attribute #4 name `ReadBandwidth' flags 5
Memory attribute #5 name `WriteBandwidth' flags 5
Memory attribute #3 name `Latency' flags 6
Memory attribute #6 name `ReadLatency' flags 6
Memory attribute #7 name `WriteLatency' flags 6

This causes hwloc-calc to report no best memory for these attributes :

# No memory reported with attribute has no value
$ hwloc-calc --oo --local-memory --best-memattr Latency machine:0

# Working fine when attribute has value :
$ hwloc-calc --oo --local-memory --best-memattr Capacity machine:0
NUMANode:2

When I could like to print the firtst one (or best, all. see below).

Also, when all nodes have the same value for a given attribute, this command only returns the first one.

# Working fine when attribute has value :
$ hwloc-calc --oo --local-memory --best-memattr Localilty socket:0
NUMANode:0

When actually they are all best memory.

This is asking 2 things:

  • When no attribute is available, could we have a default, with all nodes having the same value, so that hwloc-cacl answers something ?
  • Can we have a mode or new flag that would make --best-memattr answer a list of nodes whenever they have the same value ?

Best.

antoine-morvan avatar Feb 13 '24 18:02 antoine-morvan

Hello. In the case where you say "they are all best memory", we're in the case of --local-memory. How many nodes are actually local to "socket:0" here? only NUMAnode:0 or also another one? Answering a list of nodes is certainly possible. The current calc option is based on the API that returns a single best one, but extending it is possible, but it will be an additional option such as --multiple-best. Once we have that, returning all nodes if they have the same non-existing value should be easy too.

bgoglin avatar Feb 14 '24 07:02 bgoglin

Ops, forgot the topology. It's a bisocket machine, 4 NUMA per socket :

image

antoine-morvan avatar Feb 14 '24 07:02 antoine-morvan

Here's a proposal for hwloc-calc (there's no change in the API yet, although I initially thought it would be strictly required).

On a SPR+HBM machine in SNC-4, we now return 4 local HBMs when askling for best bandwidth nodes near an entire socket:

$ hwloc-calc --local-memory --best-memattr bandwidth socket:1 --oo --sep " "
NUMANode:9 NUMANode:11 NUMANode:13 NUMANode:15

Previous releases returned nothing, and this behavior can still be obtained by adding a strict parameter --best-memattr bandwidth,strict which means only return memory targets whose best initiator contains the input one.

There's also a default flag to return all nodes if no best is found. For instance on my laptop:

$ hwloc-calc --local-memory --best-memattr bandwidth socket:0 --oo 

$ hwloc-calc --local-memory --best-memattr bandwidth,default socket:0 --oo 
NUMANode:0

If that answers your need, I'll cleanup and document all this before preparing a PR.

bgoglin avatar Mar 13 '24 16:03 bgoglin

Looks like it is indeed answering my needs 👍

antoine-morvan avatar Mar 14 '24 15:03 antoine-morvan

Tarball should be available for testing at https://ci.inria.fr/hwloc/job/basic/job/PR-657/ soon

bgoglin avatar Mar 21 '24 14:03 bgoglin

Fixed in upcoming 2.11, thanks for the report.

bgoglin avatar Jun 03 '24 13:06 bgoglin

I am posting 2.11rc1 right now with this fix.

bgoglin avatar Jun 17 '24 10:06 bgoglin