unified-runtime icon indicating copy to clipboard operation
unified-runtime copied to clipboard

Inconsistent behavior between UR_DEVICE_INFO_UUID vs UR_DEVICE_INFO_PCI_ADDRESS

Open al42and opened this issue 2 years ago • 11 comments

Both UR_DEVICE_INFO_UUID and UR_DEVICE_INFO_PCI_ADDRESS return char[]. However, for PCI address, it is a pretty-printed string:

$ stdbuf -i0 -o0 -e0 ./bin/urinfo --verbose 2>&1 | grep --binary-files=text PCI_ADDRESS
      PCI_ADDRESS: 0000:03:00.0
      PCI_ADDRESS: 0000:00:02.0
      PCI_ADDRESS: 0000:0a:00.0
      PCI_ADDRESS: 0000:0C:00.0

But for UUID, it's the raw 16 bytes of the UUID itself (and the reason grep needs extra flags, because they don't fit ASCII when treated as a string):

$ stdbuf -i0 -o0 -e0 ./bin/urinfo --verbose 2>&1 | grep --binary-files=text UUID
      UUID: ���V
      UUID: ���F

      UUID: XX
      UUID: sa����"��%~�]�

Both of these parameters are containing an object that is, at its core, a fixed-length sequence of bytes, but which also has a well-esteblished string representation. I don't have any preferences, but it would be nice to have a uniform treatment, or at least make the docs clear what exactly is returned in each case. char[] return type, as opposed to unsigned char[] suggests a formatted string.

P.S.: It's not very relevant for GROMACS project (although, we're considering querying PCI Addresses for CPU-GPU affinity checks); just reporting a counterintuitive behavior.

al42and avatar Dec 11 '23 16:12 al42and

Thank you for the report!

Clearly urinfo should not be printing raw output like that, perhaps the type of UR_DEVICE_INFO_UUID should be uint8_t[] (this is what it is in Level Zero).

alycm avatar Jan 10 '24 15:01 alycm

Would it make sense to have PCI_ADDRESS also return raw data as uint8_t[] then? Just for consistency sake.

al42and avatar Jan 10 '24 16:01 al42and

To implemement the UR_DEVICE_INFO_PCI_ADDRESS query:

  • Level Zero is constructing a string from a returned ze_pci_address_ext_t
  • CUDA adapter uses cuDeviceGetPCIBusId which returns a string.
  • HIP adapter uses hipDeviceGetPCIBusId which returns a string.
  • OpenCL doesn't have an analagous query, or at least the adapter doesn't implement one if it has come into existance since being written.
  • Native CPU doesn't implement this query either.

I think from a practical viewpoint as an adapter maintainer I'd be inclinded to keep it as a string.

I could see the utility of this being a struct instead, as Level Zero does it, if a user wants to make logical choices based on those attributes. This would be extra work for CUDA/HIP. I don't think it will ever be implemented in all adapters though.

kbenzie avatar Jan 10 '24 16:01 kbenzie

OpenCL has the cl_khr_pci_bus_info extension, but OpenCL can be implemented on non-PCI devices, e.g. CPUs, so there will always be situations where this extension is not provided. Same situation as Native CPU adapter.

alycm avatar Jan 10 '24 17:01 alycm

Thanks for putting me on to cl_khr_pci_bus_info, I wasn't aware.

kbenzie avatar Jan 10 '24 17:01 kbenzie

CUDA adapter uses cuDeviceGetPCIBusId which returns a string.

One could get bus/device/function as integers from cudaGetDeviceProperties: https://docs.nvidia.com/cuda/archive/9.0/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_1c5adc2ef8c6b89fb139b4684175db54a

Same for ROCm: https://docs.amd.com/projects/HIP/en/docs-5.2.0/doxygen/html/structhip_device_prop__t.html#aa0f598c0196ff1f429290a43c1fa4038

al42and avatar Jan 10 '24 19:01 al42and

Seems like switching to returning a struct for PCI bus info should be doable without parsing strings then.

As for how to print the UUID, seems that this comes from the NVIDIA drivers GPU UUID:

$ cat /proc/driver/nvidia/gpus/*/information
...
GPU UUID:        GPU-2ebb02f8-b748-9eed-a6d5-4d29cf68c055
...

So I think I'll replicate this foramt minus the GPU- prefix.

Testing with this change locally doesn't fix the fact its printing in binary file mode, I've not got to the bottom of that yet.

kbenzie avatar Jan 11 '24 15:01 kbenzie

https://github.com/oneapi-src/unified-runtime/pull/1243 fixes the UUID printing.

kbenzie avatar Jan 11 '24 16:01 kbenzie

Testing with this change locally doesn't fix the fact its printing in binary file mode, I've not got to the bottom of that yet.

See #1281

al42and avatar Jan 24 '24 16:01 al42and

Thanks @al42and!

kbenzie avatar Jan 24 '24 16:01 kbenzie

https://github.com/oneapi-src/unified-runtime/pull/1313 still requires more validation & intel/llvm changes but pushing before going on holiday for a week.

kbenzie avatar Feb 02 '24 14:02 kbenzie