Jonathan R. Madsen
Jonathan R. Madsen
@RookieT0T I looked into your assembly (note: please format in a code block in the future) and it is unclear why you are expecting data to be in the L2...
It should get loaded into L2 as a result of the read from global memory but if our HW counted that as a L2 cache hit, what would be the...
@RookieT0T If I use this code: ```cpp #ifdef NDEBUG # undef NDEBUG #endif #include #include #include #include #include #include #define HIP_API_CALL(CALL) \ { \ hipError_t error_ = (CALL); \ if(error_...
There are inherent underlying problems here which block this for being easily implemented. Roctracer/rocprofiler-sdk do not have control over the clock for GPU timestamps and those timestamps are given to...
rocprof uses a method built into HIP to trace kernels which effectively amounts to HIP reporting back to rocprof the timing of the kernels it launched. rocprofv2 using `--hip-activity` does...
@hgtsoi Side note: if you weren’t aware, there is a new `rocprofv3` released in ROCm 6.2 as a beta, which is built on top of the new [rocprofiler-sdk](https://github.com/ROCm/rocprofiler-sdk) (also released...
I'll think about this and get back to you. An alternative might be to instead just provide a CMake target which adds `-finstrument-functions` flags to the target and provides a...
Hi @pelahi, currently the flat profiles are handled by timemory and the min/max/stddev is stored in a constant size data structure that does not require dynamic allocations: [statistics.cpp](https://github.com/NERSC/timemory/blob/415650ee26f358218908983c87212b620c3a0328/source/timemory/data/statistics.hpp#L173). In order...
As you can see in the [hatchet docs for generating a flat profile](https://hatchet.readthedocs.io/en/latest/analysis_examples.html#generating-a-flat-profile), it is very straightforward to convert the trace to a flat profile and pandas has built-in capabilities...
@drbenmorgan If I had to guess, the failures are because the `os-release` fields are no longer supported: https://github.com/jrmadsen/PTL/blob/f892a93d79615ed8f51c1b9c71f0f7b771dd8223/.github/workflows/macos-ci.yml#L22