expose SRIOV information
Expose informations about SRIOV devices. After some design iterations, we fixed these goals:
- model SRIOV devices like GPUs, so with a top-level package.
- to shape the API, we start checking what the k8s SRIOV operator needs
- to further refine later, we will check what other projects like the k8s node feature discovery consume
- the goal is to enable such projects to switch to ghw if/when they want.
There are few more noteworthy items in this PR:
- due to lack of resources, initial support is for linux only. Nothing linux-specific should have sneaked in the API. Should the current API design make unnecessarily hard to add support on other platforms (e.g. windows) this should be treated as bug
- the code prefers Physical Functions for discovery, but we acknowledge that forcing consumers to traverse the physical functions to learn about virtual functions is awkward, so the API provides shallow references to the Virtual Functions for the sake of practicality.
Fixes: https://github.com/jaypipes/ghw/issues/92
Signed-off-by: Francesco Romani [email protected]
Example output, PF device:
lssriov is the same code shown in the example added to the README (see)
./lssriov 0000:05:00.0 | jq
{
"driver": "igb",
"interfaces": [
"enp5s0f0"
],
"physfn": {
"max_vf_num": 7,
"vfs": [
{
"id": 0,
"pci_address": "0000:05:10.0"
},
{
"id": 1,
"pci_address": "0000:05:10.4"
},
{
"id": 2,
"pci_address": "0000:05:11.0"
},
{
"id": 3,
"pci_address": "0000:05:11.4"
}
]
}
}
Example output, VF device:
lssriov is the same code shown in the example added to the README (see)
/lssriov 0000:05:10.0 | jq
{
"driver": "igbvf",
"interfaces": [
"enp5s0f0v0"
],
"virtfn": {
"parent_pci_address": "0000:05:00.0"
}
}
We are converging about which information we should expose about SRIOV devices. The current set of attributes represents what other established components, for example the k8s SRIOV operator exposes . Please note that in case of ghw, the same amount of data is split between the pci, net package and this new SRIOV addition. I think this is a fair enough representation which expands nicely the current layout of ghw packages, and provides a convenient enough API.
Of course, reviews and feedback are welcome to futher improve and to fill any gaps that may have escaped investigation up until now.
The biggest problem however is how to represent the SRIOV devices proper.
At best of knowledge, all the SRIOV devices are also pci devices. Is not possible, in the foreseeable feature, to have SRIOV devices which aren't also PCI devices. This seems to suggest a is-a relationship between SRIOV and PCI.
So should SRIOV devices be presented as subclasses of PCI devices?
Should SRIOV be a subpackage of the pci package (like in the current PR?)
Or should SRIOV be independent, same-level package, mimicing the relationship we currently have between gpu and pci packages?
We are converging about which information we should expose about SRIOV devices. The current set of attributes represents what other established components, for example the k8s SRIOV operator exposes . Please note that in case of ghw, the same amount of data is split between the
pci,netpackage and this new SRIOV addition. I think this is a fair enough representation which expands nicely the current layout of ghw packages, and provides a convenient enough API. Of course, reviews and feedback are welcome to futher improve and to fill any gaps that may have escaped investigation up until now.The biggest problem however is how to represent the SRIOV devices proper. At best of knowledge, all the SRIOV devices are also pci devices. Is not possible, in the foreseeable feature, to have SRIOV devices which aren't also PCI devices. This seems to suggest a
is-arelationship between SRIOV and PCI. So should SRIOV devices be presented as subclasses of PCI devices?Should SRIOV be a subpackage of the
pcipackage (like in the current PR?) Or should SRIOV be independent, same-level package, mimicing the relationship we currently have betweengpuandpcipackages?
Had a chat with @jaypipes and the my takeaways are:
- we agreed about modeling SRIOV devices like GPUs, so with a top-level package.
- we agreed that the k8s SRIOV operator is a good testbed for this API - if this new API (+ other enhancemenets we did/are doing in ghw) is enough to fill their use case, our API is good enough.
- we will watch the k8s node feature discovery them and, if possible and convenient, enable them to consume ghw instead of writing their own code.
ok, test failure is unexpected. I'll have a look ASAP.
ok, test failure is unexpected. I'll have a look ASAP.
weird, can't reproduce inside a xenial container:
FROM ubuntu:xenial
RUN apt update
RUN apt install -y curl git make gcc
RUN curl -sL -o /usr/local/bin/gimme https://raw.githubusercontent.com/travis-ci/gimme/master/gimme && chmod 0755 /usr/local/bin/gimme
RUN gimme 1.16
WORKDIR /go/src/github.com/jaypipes/ghw
# Force the go compiler to use modules.
ENV GO111MODULE=on
ENV GOPROXY=direct
# go.mod and go.sum go into their own layers.
COPY go.mod .
COPY go.sum .
COPY . .
CMD /bin/bash
same with go1.14.15 (yep, copypasta here https://github.com/jaypipes/ghw/pull/230#issuecomment-810424943)
Interesting. We had apparently-random CI failures on xenial only.
I added multiple tests consuming the pci package, and the second/third (can't pinpoint further) was always failing regardless of the order of the tests. The code looks innocent and the stacktrace run deep into the pcidb package. This change seems to fix everything
// debug CI failures on travis on xenial
if err := os.Setenv("PCIDB_DISABLE_NETWORK_FETCH", "1"); err != nil {
t.Fatalf("Canoot set PCIDB_DISABLE_NETWORK_FETCH")
}
Now I wonder if the default behaviour of pcidb should be reversed: leave network alone unless explicitely requested. @jaypipes any thoughts? worth a issue on the pcidb project for further discussion?
Interesting. We had apparently-random CI failures on xenial only. I added multiple tests consuming the
pcipackage, and the second/third (can't pinpoint further) was always failing regardless of the order of the tests. The code looks innocent and the stacktrace run deep into thepcidbpackage. This change seems to fix everything// debug CI failures on travis on xenial if err := os.Setenv("PCIDB_DISABLE_NETWORK_FETCH", "1"); err != nil { t.Fatalf("Canoot set PCIDB_DISABLE_NETWORK_FETCH") }Now I wonder if the default behaviour of
pcidbshould be reversed: leave network alone unless explicitely requested. @jaypipes any thoughts? worth a issue on thepcidbproject for further discussion?
Well this still have some merits, turns out it was in turn a symptom. The real issue -with is fix- is captured in https://github.com/jaypipes/ghw/commit/62a092dc80531c41ceb9189f8a54ebc2b653c2b0
depends on https://github.com/jaypipes/ghw/pull/247 - intentionally NOT fixing the conflict to reflect we should not move forward until we discuss https://github.com/jaypipes/ghw/pull/247
back to WIP: need to finish the rebase, and I'd like to land also https://github.com/jaypipes/ghw/pull/281 before to finish this PR.
All the deps of this PR have been merged! removing the WIP status ...but we had some bitrot. I'll take care ASAP.
@jaypipes at last! all the dependencies of this PR have been merged, CI is green and it's ready for a new review round!
Fixes: https://github.com/jaypipes/ghw/issues/92
Example output (long):
# ./ghwc -h
__
.-----. | |--. .--.--.--.
| _ | | | | | | |
|___ | |__|__| |________|
|_____|
Discover hardware information.
https://github.com/jaypipes/ghw
Usage:
ghwc [flags]
ghwc [command]
Available Commands:
baseboard Show baseboard information for the host system
bios Show BIOS information for the host system
block Show block storage information for the host system
chassis Show chassis information for the host system
cpu Show CPU information for the host system
gpu Show graphics/GPU information for the host system
help Help about any command
memory Show memory information for the host system
net Show network information for the host system
pci Show information about PCI devices on the host system
product Show product information for the host system
sriov Show SRIOV devices information for the host system
topology Show topology information for the host system
version Display the version of gofile
Flags:
--debug Enable or disable debug mode
-f, --format string Output format.
Choices are 'json','yaml', and 'human'. (default "human")
-h, --help help for ghwc
--pretty When outputting JSON, use indentation
Use "ghwc [command] --help" for more information about a command.
# ./ghwc sriov
sriov (2 phsyical 8 virtual devices)
physical function [affined to NUMA node 0]@0000:05:00.0 -> driver: 'igb' class: 'Network controller' vendor: 'Intel Corporation' product: 'I350 Gigabit Network Connection' with 4/7 virtual functions
physical function [affined to NUMA node 1]@0000:05:00.1 -> driver: 'igb' class: 'Network controller' vendor: 'Intel Corporation' product: 'I350 Gigabit Network Connection' with 4/7 virtual functions
# ./ghwc sriov -f yaml
sriov:
physical_functions:
- address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "0"
interfaces:
- enp5s0f0
max_vf_num: 7
pci:
address: "0000:05:00.0"
class:
id: "02"
name: Network controller
driver: igb
product:
id: "1521"
name: I350 Gigabit Network Connection
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: AOC-SGP-i4
vendor:
id: "8086"
name: Intel Corporation
vfs:
- address:
Bus: "05"
Device: "10"
Domain: "0000"
Function: "0"
id: 0
interfaces:
- enp5s0f0v0
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "0"
pci:
address: "0000:05:10.0"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "10"
Domain: "0000"
Function: "4"
id: 1
interfaces:
- enp5s0f0v1
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "0"
pci:
address: "0000:05:10.4"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "11"
Domain: "0000"
Function: "0"
id: 2
interfaces:
- enp5s0f0v2
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "0"
pci:
address: "0000:05:11.0"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "11"
Domain: "0000"
Function: "4"
id: 3
interfaces:
- enp5s0f0v3
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "0"
pci:
address: "0000:05:11.4"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "1"
interfaces:
- enp5s0f1
max_vf_num: 7
pci:
address: "0000:05:00.1"
class:
id: "02"
name: Network controller
driver: igb
product:
id: "1521"
name: I350 Gigabit Network Connection
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: AOC-SGP-i4
vendor:
id: "8086"
name: Intel Corporation
vfs:
- address:
Bus: "05"
Device: "10"
Domain: "0000"
Function: "1"
id: 0
interfaces:
- enp5s0f1v0
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "1"
pci:
address: "0000:05:10.1"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "10"
Domain: "0000"
Function: "5"
id: 1
interfaces:
- enp5s0f1v1
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "1"
pci:
address: "0000:05:10.5"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "11"
Domain: "0000"
Function: "1"
id: 2
interfaces:
- enp5s0f1v2
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "1"
pci:
address: "0000:05:11.1"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "11"
Domain: "0000"
Function: "5"
id: 3
interfaces:
- enp5s0f1v3
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "1"
pci:
address: "0000:05:11.5"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
virtual_functions:
- address:
Bus: "05"
Device: "10"
Domain: "0000"
Function: "0"
id: 0
interfaces:
- enp5s0f0v0
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "0"
pci:
address: "0000:05:10.0"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "10"
Domain: "0000"
Function: "4"
id: 1
interfaces:
- enp5s0f0v1
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "0"
pci:
address: "0000:05:10.4"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "11"
Domain: "0000"
Function: "0"
id: 2
interfaces:
- enp5s0f0v2
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "0"
pci:
address: "0000:05:11.0"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "11"
Domain: "0000"
Function: "4"
id: 3
interfaces:
- enp5s0f0v3
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "0"
pci:
address: "0000:05:11.4"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "10"
Domain: "0000"
Function: "1"
id: 0
interfaces:
- enp5s0f1v0
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "1"
pci:
address: "0000:05:10.1"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "10"
Domain: "0000"
Function: "5"
id: 1
interfaces:
- enp5s0f1v1
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "1"
pci:
address: "0000:05:10.5"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "11"
Domain: "0000"
Function: "1"
id: 2
interfaces:
- enp5s0f1v2
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "1"
pci:
address: "0000:05:11.1"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
- address:
Bus: "05"
Device: "11"
Domain: "0000"
Function: "5"
id: 3
interfaces:
- enp5s0f1v3
parent_address:
Bus: "05"
Device: "00"
Domain: "0000"
Function: "1"
pci:
address: "0000:05:11.5"
class:
id: "02"
name: Network controller
driver: igbvf
product:
id: "1520"
name: I350 Ethernet Controller Virtual Function
programming_interface:
id: "00"
name: unknown
revision: "0x01"
subclass:
id: "00"
name: Ethernet controller
subsystem:
id: "0000"
name: unknown
vendor:
id: "8086"
name: Intel Corporation
sorry for the delay, caused by winter holidays plus crazy january. I'll resume work here ASAP, let's try this variation!
@fromanirh, the more I look at this patch, the more I think these changes belong in the
pkg/pcipackage and not as a newpkg/sriovpackage. The fact is, SR-IOV is specific to PCI Express and therefore should really be a set of additional attributes on the existingpkg/pci.Devicestruct.See inline for more explanation.
(at LONG last I finally got some time to resume working on this PR!) I don't have strong feelings here, so I'm fine modeling SRIOV devices as extra attributes of PCI devices. The only minor question that this would arise is, however, related to GPU devices which sit in their own module. Do we want to absorb GPU devices in the PCI devices someday in the future?
Again, I don't have strong feelings here, but this (perceived?) inconsistency makes me think. Going to address your comments now, thanks for your review!
at long last, the alternate implementation is available: https://github.com/jaypipes/ghw/pull/315
we agreed to move forward with #315