Inconsistent behavior between UR_DEVICE_INFO_UUID vs UR_DEVICE_INFO_PCI_ADDRESS
Both UR_DEVICE_INFO_UUID and UR_DEVICE_INFO_PCI_ADDRESS return char[]. However, for PCI address, it is a pretty-printed string:
$ stdbuf -i0 -o0 -e0 ./bin/urinfo --verbose 2>&1 | grep --binary-files=text PCI_ADDRESS
PCI_ADDRESS: 0000:03:00.0
PCI_ADDRESS: 0000:00:02.0
PCI_ADDRESS: 0000:0a:00.0
PCI_ADDRESS: 0000:0C:00.0
But for UUID, it's the raw 16 bytes of the UUID itself (and the reason grep needs extra flags, because they don't fit ASCII when treated as a string):
$ stdbuf -i0 -o0 -e0 ./bin/urinfo --verbose 2>&1 | grep --binary-files=text UUID
UUID: ���V
UUID: ���F
UUID: XX
UUID: sa����"��%~�]�
Both of these parameters are containing an object that is, at its core, a fixed-length sequence of bytes, but which also has a well-esteblished string representation. I don't have any preferences, but it would be nice to have a uniform treatment, or at least make the docs clear what exactly is returned in each case. char[] return type, as opposed to unsigned char[] suggests a formatted string.
P.S.: It's not very relevant for GROMACS project (although, we're considering querying PCI Addresses for CPU-GPU affinity checks); just reporting a counterintuitive behavior.
Thank you for the report!
Clearly urinfo should not be printing raw output like that, perhaps the type of UR_DEVICE_INFO_UUID should be uint8_t[] (this is what it is in Level Zero).
Would it make sense to have PCI_ADDRESS also return raw data as uint8_t[] then? Just for consistency sake.
To implemement the UR_DEVICE_INFO_PCI_ADDRESS query:
- Level Zero is constructing a string from a returned
ze_pci_address_ext_t - CUDA adapter uses
cuDeviceGetPCIBusIdwhich returns a string. - HIP adapter uses
hipDeviceGetPCIBusIdwhich returns a string. - OpenCL doesn't have an analagous query, or at least the adapter doesn't implement one if it has come into existance since being written.
- Native CPU doesn't implement this query either.
I think from a practical viewpoint as an adapter maintainer I'd be inclinded to keep it as a string.
I could see the utility of this being a struct instead, as Level Zero does it, if a user wants to make logical choices based on those attributes. This would be extra work for CUDA/HIP. I don't think it will ever be implemented in all adapters though.
OpenCL has the cl_khr_pci_bus_info extension, but OpenCL can be implemented on non-PCI devices, e.g. CPUs, so there will always be situations where this extension is not provided. Same situation as Native CPU adapter.
Thanks for putting me on to cl_khr_pci_bus_info, I wasn't aware.
CUDA adapter uses cuDeviceGetPCIBusId which returns a string.
One could get bus/device/function as integers from cudaGetDeviceProperties: https://docs.nvidia.com/cuda/archive/9.0/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_1c5adc2ef8c6b89fb139b4684175db54a
Same for ROCm: https://docs.amd.com/projects/HIP/en/docs-5.2.0/doxygen/html/structhip_device_prop__t.html#aa0f598c0196ff1f429290a43c1fa4038
Seems like switching to returning a struct for PCI bus info should be doable without parsing strings then.
As for how to print the UUID, seems that this comes from the NVIDIA drivers GPU UUID:
$ cat /proc/driver/nvidia/gpus/*/information
...
GPU UUID: GPU-2ebb02f8-b748-9eed-a6d5-4d29cf68c055
...
So I think I'll replicate this foramt minus the GPU- prefix.
Testing with this change locally doesn't fix the fact its printing in binary file mode, I've not got to the bottom of that yet.
https://github.com/oneapi-src/unified-runtime/pull/1243 fixes the UUID printing.
Testing with this change locally doesn't fix the fact its printing in binary file mode, I've not got to the bottom of that yet.
See #1281
Thanks @al42and!
https://github.com/oneapi-src/unified-runtime/pull/1313 still requires more validation & intel/llvm changes but pushing before going on holiday for a week.