rocprofiler [Issue]: rocprofv2 get more kernel dispatches than rocprofv1

Problem Description

When use api trace on vllm inference, rocprof get less kernel dispatch records than rocprof_v2, which result tend to be correct? Possible reasons for the mismatch between kernel records of v1 and v2?

Operating System

OS: NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)"

CPU

CPU: model name : Hygon C86 3380 8-core Processor

GPU

AMD Instinct MI250

ROCm Version

ROCm 5.7.1

ROCm Component

rocprofiler

Steps to Reproduce

rocprof and rocprofv2 hip-trace, kernel-trace on vllm inference app

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Aug 26 '24 07:08 hgtsoi

Hi @hgtsoi , can you provide more detailed reproduction steps? I am not sure exactly what "vllm inference app" refers to. If you are running a specific workload, can you please provide the binary (or source repo would be even better) so we can reproduce the issue internally and diagnose it.

Sep 04 '24 13:09 jamesxu2

rocprof uses a method built into HIP to trace kernels which effectively amounts to HIP reporting back to rocprof the timing of the kernels it launched. rocprofv2 using --hip-activity does the same but rocprofv2 with --kernel-trace uses a (more robust) lower-level queue interception method in the HSA library. I suspect the “extra” kernels you are seeing in rocprofv2 are via kernel tracing and those extra kernels have names starting with __amd_rocclr_. These are called BLIT kernels and HIP frequently uses them in things like the memset routines (basically just imagine a kernel which has an GPU memory address, a fill value, and a number of bytes which sets all those addresses to the fill value). IIRC, hip-trace does not self-report the BLIT kernels it uses back to rocprof.

Sep 04 '24 14:09 jrmadsen

@hgtsoi Side note: if you weren’t aware, there is a new rocprofv3 released in ROCm 6.2 as a beta, which is built on top of the new rocprofiler-sdk (also released in ROCm 6.2 as a beta).

rocprofv2 never officially made it out of the beta stage. For various reasons, we completely re-designed the underlying profiling library (rocprofiler-sdk) and rocprofv3 from scratch.

I’d strongly suggest using rocprofv3 over rocprofv2 at this point. rocprofv3 is very close to having feature parity, has a lower overhead than v1 and v2, and is significantly better tested.

Sep 04 '24 14:09 jrmadsen

@hgtsoi closing this ticket due to inactivity. Feel free to reopen it if you still need help.

Oct 11 '24 14:10 jamesxu2