vexcl
vexcl copied to clipboard
Look into using nvrtc for just-in-time cuda compilation
NVRTC is a runtime compilation library for CUDA C++. It accepts CUDA C++ source code in character string form and creates handles that can be used to obtain the PTX: http://docs.nvidia.com/cuda/nvrtc/index.html
Curious whether these compilation errors still exist with CUDA 9 or even CUDA 10?
Yes, I am still getting the errors described in 35a9f30 with CUDA 9.1:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
To test this: checkout branch nvrtc, do cmake . -DVEXCL_BACKEND=CUDA -Bbuild && cmake --build build && ./build/examples/benchmark. Here is what I get:
----------------------------------------------------------
Profiling "double" performance
----------------------------------------------------------
1. Tesla V100-SXM2-16GB
Vector SAXPY (double)
OpenCL
GFLOPS: 62.8019
Bandwidth: 753.623
C++
GFLOPS: 2.0774
Bandwidth: 24.9288
res = 1.1856e-26
Vector arithmetic (double)
OpenCL
GFLOPS: 57.3009
Bandwidth: 764.012
C++
GFLOPS: 1.53007
Bandwidth: 20.401
res = 5.09807e-19
Reduction (double)
OpenCL
GFLOPS: 103.171
Bandwidth: 825.371
C++
GFLOPS: 1.04469
Bandwidth: 8.35756
res = 6.74923e-14
Stencil convolution (double)
OpenCL
GFLOPS: 1181.18
Bandwidth: 9449.44
C++
GFLOPS: 1.34248
Bandwidth: 10.7398
res = 3.33067e-16
SpMV (double)
OpenCL
GFLOPS: 104.44
Bandwidth: 1397.87
C++
GFLOPS: 1.41486
Bandwidth: 18.9371
res = 9.74447e-15
SpMV (CCSR) (double)
OpenCL
GFLOPS: 315.465
Bandwidth: 3635.06
C++
GFLOPS: 1.57218
Bandwidth: 18.116
res = 9.74447e-15
Random numbers per second (double)
OpenCL (threefry): 1.05625e+11
OpenCL (philox): 1.8714e+11
C++ (mt19937): 1.05787e+08
./benchmark(_ZN3vex6detail15print_backtraceEv+0x1a) [0x4462ea]
./benchmark(_ZN3vex7backend4cuda13build_sourcesERKNS1_13command_queueERKSsS6_+0x21b) [0x44c70b]
./benchmark() [0x43ffa3]
./benchmark(_ZN3vex6detail17block_sort_kernelILi256ELi7EN5boost3mpl6vectorImN4mpl_2naES6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_EENS4_IS6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_EENS_4lessImE19vex_function_deviceEEERNS_7backend4cuda6kernelERKNSD_13command_queueE+0xb9d) [0x4bbecd]
./benchmark(_ZN3vex6detail4sortIN5boost6fusion14transform_viewIKNS3_6vectorIIRNS_6vectorImEEEEENS0_21extract_device_vectorENS3_5void_EEENS_4lessImE19vex_function_deviceEEEvRKNS_7backend4cuda13command_queueERT_T0_+0x9f) [0x4be38f]
./benchmark(_ZN3vex6detail9sort_sinkIN5boost6fusion6vectorIIRNS_6vectorImEEEEENS_4lessImEEEEvOT_T0_+0x80) [0x4becd0]
./benchmark(_Z14benchmark_sortIdEvRKN3vex7ContextERNS0_8profilerINSt6chrono3_V212system_clockEEE+0x189) [0x4bf049]
./benchmark(_Z9run_testsIdEvRKN3vex7ContextERNS0_8profilerINSt6chrono3_V212system_clockEEE+0x4be) [0x4c8efe]
./benchmark(main+0x745) [0x43d525]
/usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x2af97c343c05]
./benchmark() [0x43db52]
default_program(646): warning: declaration does not declare anything
default_program(657): error: union "Shared" has no member "keys0"
default_program(658): error: union "Shared" has no member "keys0"
default_program(716): error: union "Shared" has no member "keys0"
default_program(717): error: union "Shared" has no member "keys0"
4 errors detected in the compilation of "default_program".
*/vexcl/vexcl/backend/cuda/compiler_nvrtc.hpp:104
NVRTC Error (6 - NVRTC_ERROR_COMPILATION)
^C