omnitrace icon indicating copy to clipboard operation
omnitrace copied to clipboard

omnitrace hangs before hostCallback function

Open jakub-homola opened this issue 1 year ago • 5 comments

Hello,

I am trying to trace my AMDGPU application with Omnitrace, but I am running into an issue with a host callback function. Using hipStreamAddCallback I submit a host function into a stream. Without Omnitrace, the program works as expected. But with Omnitrace, the program hangs and the host function is never launched.

Reproducer program:

#include <cstdio>
#include <hip/hip_runtime.h>

#define CHECK(status) do { check((status), __FILE__, __LINE__); } while(false)
inline static void check(hipError_t error_code, const char *file, int line)
{
    if (error_code != hipSuccess)
    {
        fprintf(stderr, "HIP Error %d %s: %s. In file '%s' on line %d\n", error_code, hipGetErrorName(error_code), hipGetErrorString(error_code), file, line);
        fflush(stderr);
        exit(error_code);
    }
}

__global__ void dummy_kernel(int a)
{
    printf("I am dummy kernel %d\n", a);
}

int main()
{
    printf("AAA\n");
    CHECK(hipDeviceSynchronize());
    printf("BBB\n");
    dummy_kernel<<< 1,1 >>>(1);
    printf("CCC\n");
    CHECK(hipDeviceSynchronize());
    printf("DDD\n");
    CHECK(hipStreamAddCallback(0, [](hipStream_t stream_, hipError_t status_, void * arg){
        printf("I am host function\n");
    }, nullptr, 0));
    printf("EEE\n");
    CHECK(hipDeviceSynchronize());
    printf("FFF\n");
    dummy_kernel<<< 1,1 >>>(2);
    printf("GGG\n");
    CHECK(hipDeviceSynchronize());
    printf("HHH\n");

    return 0;
}

When running it without omnitrace, the program correctly outputs

AAA
BBB
CCC
I am dummy kernel 1
DDD
EEE
I am host function
FFF
GGG
I am dummy kernel 2
HHH

but with omnitrace, it only outputs

AAA
BBB
CCC
I am dummy kernel 1
DDD
EEE

and then nothing, then it just hangs, seemingly forever.

I am compiling the program using

hipcc -g -O2 source.hip.cpp -o program.x

And runing using

omnitrace-sample -- ./program.x

omnitrace-instrument seems to have the same problem.

I am on LUMI-G compute node (MI250x), using rocm-5.2.3 (the only one properly supported there, module load LUMI/23.03 rocm/5.2.3). I installed omnitrace using this guide, just running the installation script and adding the appropriate directories to PATH and LD_LIBRARY_PATH.

$ omnitrace-sample --version
omnitrace-sample v1.10.2 (rev: 0b751d2aef7d32d8b4fab184d0b34d4013b6d986, tag: v1.10.2, compiler: GNU v7.5.0, rocm: v5.2.x)

In case I missed any details, please ask.

I would appreciate any help.

jakub-homola avatar Sep 21 '23 18:09 jakub-homola