iree icon indicating copy to clipboard operation
iree copied to clipboard

Compilation errors in CUDA HAL Driver with tracing features enabled

Open stellaraccident opened this issue 1 year ago • 4 comments

cmake -DIREE_BUILD_COMPILER=OFF -DIREE_ENABLE_RUNTIME_TRACING=ON -DIREE_TRACING_PROVIDER=console -DIREE_HAL_DRIVER_CUDA=ON .
ninja

Compilation errors:

FAILED: runtime/src/iree/hal/drivers/cuda/CMakeFiles/iree_hal_drivers_cuda_cuda.objects.dir/tracing.c.o 
ccache /usr/bin/clang -DIREE_TRACING_MODE=2 -DIREE_TRACING_PROVIDER_H=\"iree/base/tracing/console.h\" -I/home/stella/src/iree -I/home/stella/src/iree-build -I/home/stella/src/iree/runtime/src -I/home/stella/src/iree-build/runtime/src -I/home/stella/src/iree-build/build_tools/third_party/cuda/12.2.1/linux-x86_64/include -I/home/stella/src/iree/third_party/nccl -I/home/stella/src/iree-build/runtime/src/iree/base/internal/flatcc -I/home/stella/src/iree-build/runtime/src/iree/schemas -isystem /home/stella/src/iree/third_party/flatcc/include -O2 -g   -gsplit-dwarf -ggnu-pubnames -std=gnu11 -flto=thin -fPIC -fvisibility=hidden -Werror -Wall -Wno-error=deprecated-declarations -Wno-ambiguous-member-template -Wno-char-subscripts -Wno-extern-c-compat -Wno-gnu-alignof-expression -Wno-gnu-variable-sized-type-not-at-end -Wno-ignored-optimization-argument -Wno-invalid-offsetof -Wno-invalid-source-encoding -Wno-mismatched-tags -Wno-pointer-sign -Wno-reserved-user-defined-literal -Wno-return-type-c-linkage -Wno-self-assign-overloaded -Wno-sign-compare -Wno-signed-unsigned-wchar -Wno-strict-overflow -Wno-trigraphs -Wno-unknown-pragmas -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-unused-const-variable -Wno-unused-function -Wno-unused-local-typedef -Wno-unused-private-field -Wno-user-defined-warnings -Wno-missing-braces -Wc++20-extensions -Wctad-maybe-unsupported -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Wimplicit-fallthrough -Winfinite-recursion -Wliteral-conversion -Wnon-virtual-dtor -Woverloaded-virtual -Wpointer-arith -Wself-assign -Wstring-conversion -Wtautological-overlap-compare -Wthread-safety -Wthread-safety-beta -Wunused-comparison -Wvla -fno-lax-vector-conversions -fno-omit-frame-pointer -fmacro-prefix-map=/home/stella/src/iree=iree -I/home/stella/src/iree/third_party/flatcc/include/ -I/home/stella/src/iree/third_party/flatcc/include/flatcc/reflection/ -MD -MT runtime/src/iree/hal/drivers/cuda/CMakeFiles/iree_hal_drivers_cuda_cuda.objects.dir/tracing.c.o -MF runtime/src/iree/hal/drivers/cuda/CMakeFiles/iree_hal_drivers_cuda_cuda.objects.dir/tracing.c.o.d -o runtime/src/iree/hal/drivers/cuda/CMakeFiles/iree_hal_drivers_cuda_cuda.objects.dir/tracing.c.o -c /home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:65:24: error: implicit declaration of function 'iree_tracing_time' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
  *out_cpu_timestamp = iree_tracing_time();
                       ^
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:125:19: error: implicit declaration of function 'iree_tracing_gpu_context_allocate' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
    context->id = iree_tracing_gpu_context_allocate(
                  ^
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:126:9: error: use of undeclared identifier 'IREE_TRACING_GPU_CONTEXT_TYPE_CUDA'
        IREE_TRACING_GPU_CONTEXT_TYPE_CUDA, queue_name.data, queue_name.size,
        ^
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:206:7: error: implicit declaration of function 'iree_tracing_gpu_zone_notify' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
      iree_tracing_gpu_zone_notify(context->id, query_id, gpu_timestamp);
      ^
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:252:3: error: implicit declaration of function 'iree_tracing_gpu_zone_begin' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
  iree_tracing_gpu_zone_begin(context->id, query_id, src_loc);
  ^
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:252:3: note: did you mean 'iree_tracing_zone_end'?
/home/stella/src/iree/runtime/src/iree/base/tracing/console.h:92:6: note: 'iree_tracing_zone_end' declared here
void iree_tracing_zone_end(iree_zone_id_t zone_id);
     ^
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:263:3: error: implicit declaration of function 'iree_tracing_gpu_zone_begin_external' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
  iree_tracing_gpu_zone_begin_external(context->id, query_id, file_name,
  ^
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:263:3: note: did you mean 'iree_tracing_zone_begin_external_impl'?
/home/stella/src/iree/runtime/src/iree/base/tracing/console.h:88:37: note: 'iree_tracing_zone_begin_external_impl' declared here
IREE_MUST_USE_RESULT iree_zone_id_t iree_tracing_zone_begin_external_impl(
                                    ^
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:273:3: error: implicit declaration of function 'iree_tracing_gpu_zone_end' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
  iree_tracing_gpu_zone_end(context->id, query_id);
  ^
/home/stella/src/iree/runtime/src/iree/hal/drivers/cuda/tracing.c:273:3: note: did you mean 'iree_tracing_zone_end'?
/home/stella/src/iree/runtime/src/iree/base/tracing/console.h:92:6: note: 'iree_tracing_zone_end' declared here
void iree_tracing_zone_end(iree_zone_id_t zone_id);
     ^
7 errors generated.

stellaraccident avatar Feb 14 '24 06:02 stellaraccident

Looks we are missing iree_tracing_time definition in the console provider. The tracy provider, which is the default and commonly used, has it. Needs something like https://github.com/wolfpld/tracy/blob/master/public/client/TracyProfiler.hpp#L194

antiagainst avatar Feb 14 '24 06:02 antiagainst

We can also add test coverage for -DIREE_TRACING_PROVIDER=console on CI:

https://github.com/openxla/iree/blob/c02b89e3c7e22eff009fc318132b5ed3fe9a2d97/.github/workflows/ci.yml#L655-L682

https://github.com/openxla/iree/blob/main/build_tools/cmake/build_tracing.sh

(ugh, I'm tempted to inline / refactor those scripts... annoying having them separate from the workflows and wrapped in so much boilerplate)

ScottTodd avatar Feb 14 '24 16:02 ScottTodd

Looks we are missing iree_tracing_time definition in the console provider. The tracy provider, which is the default and commonly used, has it. Needs something like https://github.com/wolfpld/tracy/blob/master/public/client/TracyProfiler.hpp#L194

The Vulkan driver uses IREE_TRACING_FEATURE_INSTRUMENTATION_DEVICE to gate usage of iree_tracing_time. I'll try switching the CUDA driver to the same thing: https://github.com/openxla/iree/blob/da982154aebccb41c1cf9bf5594097a2e6906b19/runtime/src/iree/hal/drivers/cuda/tracing.c#L9

We could also define those functions for other instrumentation levels: https://github.com/openxla/iree/blob/da982154aebccb41c1cf9bf5594097a2e6906b19/runtime/src/iree/base/tracing/tracy.h#L134-L137

ScottTodd avatar Feb 19 '24 17:02 ScottTodd

guarding on INSTRUMENTATION_DEVICE is the correct fix - thanks scott!

benvanik avatar Feb 19 '24 18:02 benvanik