rocprofiler icon indicating copy to clipboard operation
rocprofiler copied to clipboard

[Issue]: rocprofv2 get more kernel dispatches than rocprofv1

Open hgtsoi opened this issue 1 year ago • 3 comments

Problem Description

When use api trace on vllm inference, rocprof get less kernel dispatch records than rocprof_v2, which result tend to be correct? Possible reasons for the mismatch between kernel records of v1 and v2?

Operating System

OS: NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)"

CPU

CPU: model name : Hygon C86 3380 8-core Processor

GPU

AMD Instinct MI250

ROCm Version

ROCm 5.7.1

ROCm Component

rocprofiler

Steps to Reproduce

rocprof and rocprofv2 hip-trace, kernel-trace on vllm inference app

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

hgtsoi avatar Aug 26 '24 07:08 hgtsoi

Hi @hgtsoi , can you provide more detailed reproduction steps? I am not sure exactly what "vllm inference app" refers to. If you are running a specific workload, can you please provide the binary (or source repo would be even better) so we can reproduce the issue internally and diagnose it.

jamesxu2 avatar Sep 04 '24 13:09 jamesxu2

rocprof uses a method built into HIP to trace kernels which effectively amounts to HIP reporting back to rocprof the timing of the kernels it launched. rocprofv2 using --hip-activity does the same but rocprofv2 with --kernel-trace uses a (more robust) lower-level queue interception method in the HSA library. I suspect the “extra” kernels you are seeing in rocprofv2 are via kernel tracing and those extra kernels have names starting with __amd_rocclr_. These are called BLIT kernels and HIP frequently uses them in things like the memset routines (basically just imagine a kernel which has an GPU memory address, a fill value, and a number of bytes which sets all those addresses to the fill value). IIRC, hip-trace does not self-report the BLIT kernels it uses back to rocprof.

jrmadsen avatar Sep 04 '24 14:09 jrmadsen

@hgtsoi Side note: if you weren’t aware, there is a new rocprofv3 released in ROCm 6.2 as a beta, which is built on top of the new rocprofiler-sdk (also released in ROCm 6.2 as a beta).

rocprofv2 never officially made it out of the beta stage. For various reasons, we completely re-designed the underlying profiling library (rocprofiler-sdk) and rocprofv3 from scratch.

I’d strongly suggest using rocprofv3 over rocprofv2 at this point. rocprofv3 is very close to having feature parity, has a lower overhead than v1 and v2, and is significantly better tested.

jrmadsen avatar Sep 04 '24 14:09 jrmadsen

@hgtsoi closing this ticket due to inactivity. Feel free to reopen it if you still need help.

jamesxu2 avatar Oct 11 '24 14:10 jamesxu2