node-feature-discovery icon indicating copy to clipboard operation
node-feature-discovery copied to clipboard

feature request: socket topology

Open vsoch opened this issue 1 year ago • 9 comments

hiya! I'm looking to get a mapping of which cores belong to which socket, akin to what hwloc does: https://www.open-mpi.org/projects/hwloc/doc/v0.9.3/

Right now it looks like nfd exposes a cpu -> topology.socket_count which is great, but doesn't tell us about the topology. Would this be possible? Thanks!

vsoch avatar Feb 24 '24 22:02 vsoch

I'm also not seeing basics about number of physical vs logical cores - that seems obvious like it should be here?

vsoch avatar Feb 24 '24 23:02 vsoch

The node labels are not suitable for describing detailed hardware topology. I also think that the NodeFeature CRD isn't a very good at that. There's a separate nfd-topology-updater daemon that would be a better target for this feature – it exposes the topology via a separate CRD called NodeResourceTopology. The topology-updater currently exposes NUMA nodes only, not other aspects of the HW topology.

What is your use case for this request? How are you planning to consume the information?

marquiz avatar Feb 26 '24 08:02 marquiz

I’m working on the compatibility specification for OCI and building extractor plugins and tools, for Kubernetes but also HPC. For a lot of HPC applications we want to know the detailed topology to best schedule work. https://www-hpc.cea.fr/tgcc-public/en/html/toc/fulldoc/Process_distribution_affinity_binding.html

vsoch avatar Feb 26 '24 09:02 vsoch

It could be that hwloc is a better fit for this - I found a library in go but it has a bug so I opened an issue.

vsoch avatar Feb 26 '24 09:02 vsoch