timemory icon indicating copy to clipboard operation
timemory copied to clipboard

compilation error in gpu example gpu_device_timer

Open pkestene opened this issue 3 years ago • 4 comments

Hello,

i'm new to timemory. I was just trying to build with cuda/gpu support, and I have a compilation error when building gpu examples. It is a bit weird to me. The compiler doesn't seem to be enable to find the right overload of data_tracker::store; I don't see anything wrong in the code.

Here is the full compilation command and the error:

[ 93%] Building CUDA object examples/ex-gpu/v3/CMakeFiles/ex_kernel_instrument_v3.dir/gpu_device_timer.cpp.o
cd /home/pkestene/install/timemory/git/timemory/build/cuda/examples/ex-gpu/v3 && /usr/local/cuda-11.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/c++ -DTIMEMORY_CMAKE -DTIMEMORY_USE_BACKENDS_EXTERN -DTIMEMORY_USE_COMMON_EXTERN -DTIMEMORY_USE_COMPONENT_EXTERN -DTIMEMORY_USE_CONFIG_EXTERN -DTIMEMORY_USE_CONTAINERS_EXTERN -DTIMEMORY_USE_CORE_EXTERN -DTIMEMORY_USE_CUDA -DTIMEMORY_USE_CUDA_EXTERN -DTIMEMORY_USE_DATA_TRACKER_EXTERN -DTIMEMORY_USE_ERT_EXTERN -DTIMEMORY_USE_EXTERN -DTIMEMORY_USE_GPU -DTIMEMORY_USE_IO_EXTERN -DTIMEMORY_USE_LIBUNWIND -DTIMEMORY_USE_MANAGER_EXTERN -DTIMEMORY_USE_NETWORK_EXTERN -DTIMEMORY_USE_NVTX -DTIMEMORY_USE_OPERATIONS_EXTERN -DTIMEMORY_USE_PRINTER_EXTERN -DTIMEMORY_USE_RUNTIME_EXTERN -DTIMEMORY_USE_RUSAGE_EXTERN -DTIMEMORY_USE_STATISTICS -DTIMEMORY_USE_STORAGE_EXTERN -DTIMEMORY_USE_TIMESTAMP_EXTERN -DTIMEMORY_USE_TIMING_EXTERN -DTIMEMORY_USE_TRIP_COUNT_EXTERN -DTIMEMORY_USE_USER_BUNDLE_EXTERN -DTIMEMORY_USE_VARIADIC_EXTERN -DTIMEMORY_USE_XML -DTIMEMORY_VEC=256 -DUNW_LOCAL_ONLY -Dex_kernel_instrument_v3_EXPORTS -I/home/pkestene/install/timemory/git/timemory/build/cuda/source -I/home/pkestene/install/timemory/git/timemory/source -I/usr/local/cuda-11.6/include -isystem=/usr/local/cuda-11.6/targets/x86_64-linux/include -arch=sm_75 -O3 -DNDEBUG --generate-code=arch=compute_75,code=[compute_75,sm_75] -arch=sm_75 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 --extended-lambda -Xcompiler=-W -Xcompiler=-Wall -Xcompiler=-Wno-unknown-pragmas -Xcompiler=-Wno-ignored-attributes -Xcompiler=-Wno-attributes -Xcompiler=-Wno-missing-field-initializers -Xcompiler=-Wno-class-memaccess -Xcompiler=-fno-signaling-nans -Xcompiler=-fno-trapping-math -Xcompiler=-fno-signed-zeros -Xcompiler=-ffinite-math-only -Xcompiler=-fno-math-errno -Xcompiler=-fpredictive-commoning -Xcompiler=-fvariable-expansion-in-unroller -Xcompiler=-faligned-new -Xcompiler=-ftls-model=initial-exec -Xcompiler=-rdynamic -Xcompiler=-finline-functions -Xcompiler=-funroll-loops -Xcompiler=-ftree-vectorize -Xcompiler=-ftree-loop-optimize -Xcompiler=-ftree-loop-vectorize -lineinfo -std=c++14 -x cu -c /home/pkestene/install/timemory/git/timemory/examples/ex-gpu/v3/gpu_device_timer.cpp -o CMakeFiles/ex_kernel_instrument_v3.dir/gpu_device_timer.cpp.o
/home/pkestene/install/timemory/git/timemory/examples/ex-gpu/v3/gpu_device_timer.hpp(134): warning #177-D: variable "_data" was declared but never referenced

/home/pkestene/install/timemory/git/timemory/source/timemory/components/data_tracker/components.hpp(677): error: no instance of overloaded function "tim::component::data_tracker<InpT, Tag>::store [with InpT=double, Tag=gpu_data_tag]" matches the argument list
            argument types are: (std::plus<double>, double)
            object type is: tim::component::data_tracker<double, gpu_data_tag>
          detected during instantiation of "tim::component::data_tracker<InpT, Tag>::this_type *tim::component::data_tracker<InpT, Tag>::add_secondary(const std::string &, FuncT &&, T &&, tim::component::data_tracker<InpT, Tag>::enable_if_acceptable_t<T, int>) [with InpT=double, Tag=gpu_data_tag, FuncT=std::plus<double>, T=double &]" 
/home/pkestene/install/timemory/git/timemory/examples/ex-gpu/v3/gpu_device_timer.cpp(90): here

The host compiler is g++-11, but I tried g++-10 also, the error is stil there.

Any help appreciated.

pkestene avatar Feb 06 '22 17:02 pkestene

Interesting... that overload is used quite often. Could you try replacing std::plus<double>{} with a lambda, e.g. [](double lhs, double rhs) { return lhs + rhs; }?

jrmadsen avatar Feb 08 '22 19:02 jrmadsen

Ah based on this [ 93%] Building CUDA object examples/ex-gpu/v3/CMakeFiles/ex_kernel_instrument_v3.dir/gpu_device_timer.cpp.o, I think this might be an NVCC bug. Unfortunately NVCC is quite unreliable when it comes to templates. If the above fails, could you try another CUDA version instead of a different GCC version to try to verify it is a CUDA 11.6 bug?

jrmadsen avatar Feb 08 '22 19:02 jrmadsen

Thanks for your answer, unfortunately :

  • same error with cuda toolkit 11.5.2
  • if I change std::plus<double>{} into [](double lhs, double rhs) { return lhs + rhs; }, the error is similar
/data/pkestene/install/timemory/git/timemory/source/timemory/components/data_tracker/components.hpp(677): error: no instance of overloaded function "tim::component::data_tracker<InpT, Tag>::store [with InpT=double, Tag=gpu_data_tag]" matches the argument list
            argument types are: (lambda [](double, double)->double, double)
            object type is: tim::component::data_tracker<double, gpu_data_tag>
          detected during instantiation of "tim::component::data_tracker<InpT, Tag>::this_type *tim::component::data_tracker<InpT, Tag>::add_secondary(const std::string &, FuncT &&, T &&, tim::component::data_tracker<InpT, Tag>::enable_if_acceptable_t<T, int>) [with InpT=double, Tag=gpu_data_tag, FuncT=lambda [](double, double)->double, T=double &]" 
/data/pkestene/install/timemory/git/timemory/examples/ex-gpu/v3/gpu_device_timer.cpp(92): here

pkestene avatar Feb 08 '22 21:02 pkestene

Yeah, I was able to reproduce it. It is definitely a NVCC bug -- if I make the necessary changes to compile gpu_device_timer.cpp and gpu_op_tracker.cpp with the host compiler (basically guarding the kernel launches and device functions with #if defined(TIMEMORY_GPUCC) and tweaking the CMakeLists.txt to only set ex_kernel_instrument.cpp as a CUDA source) then it compiles and runs fine. Let me think a bit more on how this should be handled and get back to you bc I am getting tired of having to create workarounds for templates with NVCC, e.g. #237.

jrmadsen avatar Feb 08 '22 21:02 jrmadsen