roctracer icon indicating copy to clipboard operation
roctracer copied to clipboard

[Issue]: roctracer_record_t returned device_id are off by 2. Devices are enumerated 2 to 9 instead of 0 to 7.

Open aaronenyeshi opened this issue 1 year ago • 2 comments

Problem Description

Hi, We are using Roctracer for capturing GPU events via roctracer_record_t and hcc_cb_properties.buffer_callback_fun = activity_callback;. However, we've found that events have device_id starting from 2 to 9. When using hipGetDeviceProperties, we can observe that ids starting from 0 to 7.

Why is this off by 2? Here is our workaround: https://github.com/pytorch/kineto/pull/925

Our Implementation:

Obtain roctracer_record_t and device_id here: https://github.com/pytorch/kineto/blob/cc24537ac461f08597fab3192e59a3952719d7a2/libkineto/src/RoctracerLogger.cpp#L313

Store as int type: https://github.com/pytorch/kineto/blob/cc24537ac461f08597fab3192e59a3952719d7a2/libkineto/src/RoctracerLogger.h#L179

Matches roctracer activity_record_s: https://github.com/ROCm/roctracer/blob/amd-master/inc/ext/prof_protocol.h#L83

Operating System

CentOS Stream 9

CPU

AMD EPYC 7713

GPU

AMD Instinct MI250

ROCm Version

ROCm 6.0.1

ROCm Component

roctracer

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

aaronenyeshi avatar May 03 '24 20:05 aaronenyeshi

@aaronenyeshi Internal ticket has been created to investigate this issue. Thanks!

ppanchad-amd avatar Aug 15 '24 17:08 ppanchad-amd

Hi @aaronenyeshi, as you've noted in https://github.com/pytorch/kineto/pull/926, this is due to roctracer enumerating the CPU as well as the GPU devices. This is by design; roctracer is pulling the node ids provided by the kernel driver as it is the most convenient way to get unique device ids, while hipGetDeviceProperties is simply enumerating the GPUs as its function is to report information for the GPUs. However, this isn't clearly documented, and I can see how these device ids could be expected to match, so we're updating the docs to indicate this. Thanks for bringing this to our attention!

schung-amd avatar Aug 22 '24 17:08 schung-amd