vexcl icon indicating copy to clipboard operation
vexcl copied to clipboard

Look into using nvrtc for just-in-time cuda compilation

Open ddemidov opened this issue 10 years ago • 2 comments

NVRTC is a runtime compilation library for CUDA C++. It accepts CUDA C++ source code in character string form and creates handles that can be used to obtain the PTX: http://docs.nvidia.com/cuda/nvrtc/index.html

ddemidov avatar Mar 23 '15 10:03 ddemidov

Curious whether these compilation errors still exist with CUDA 9 or even CUDA 10?

rosenrodt avatar Dec 01 '18 08:12 rosenrodt

Yes, I am still getting the errors described in 35a9f30 with CUDA 9.1:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

To test this: checkout branch nvrtc, do cmake . -DVEXCL_BACKEND=CUDA -Bbuild && cmake --build build && ./build/examples/benchmark. Here is what I get:

----------------------------------------------------------
Profiling "double" performance
----------------------------------------------------------
1. Tesla V100-SXM2-16GB

Vector SAXPY (double)
  OpenCL
    GFLOPS:    62.8019
    Bandwidth: 753.623
  C++
    GFLOPS:    2.0774
    Bandwidth: 24.9288
  res = 1.1856e-26

Vector arithmetic (double)
  OpenCL
    GFLOPS:    57.3009
    Bandwidth: 764.012
  C++
    GFLOPS:    1.53007
    Bandwidth: 20.401
  res = 5.09807e-19

Reduction (double)
  OpenCL
    GFLOPS:    103.171
    Bandwidth: 825.371
  C++
    GFLOPS:    1.04469
    Bandwidth: 8.35756
  res = 6.74923e-14

Stencil convolution (double)
  OpenCL
    GFLOPS:    1181.18
    Bandwidth: 9449.44
  C++
    GFLOPS:    1.34248
    Bandwidth: 10.7398
  res = 3.33067e-16

SpMV (double)
  OpenCL
    GFLOPS:    104.44
    Bandwidth: 1397.87
  C++
    GFLOPS:    1.41486
    Bandwidth: 18.9371
  res = 9.74447e-15

SpMV (CCSR) (double)
  OpenCL
    GFLOPS:    315.465
    Bandwidth: 3635.06
  C++
    GFLOPS:    1.57218
    Bandwidth: 18.116
  res = 9.74447e-15

Random numbers per second (double)
    OpenCL (threefry): 1.05625e+11
    OpenCL (philox):   1.8714e+11
    C++    (mt19937):  1.05787e+08

./benchmark(_ZN3vex6detail15print_backtraceEv+0x1a) [0x4462ea]
./benchmark(_ZN3vex7backend4cuda13build_sourcesERKNS1_13command_queueERKSsS6_+0x21b) [0x44c70b]
./benchmark() [0x43ffa3]
./benchmark(_ZN3vex6detail17block_sort_kernelILi256ELi7EN5boost3mpl6vectorImN4mpl_2naES6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_EENS4_IS6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_EENS_4lessImE19vex_function_deviceEEERNS_7backend4cuda6kernelERKNSD_13command_queueE+0xb9d) [0x4bbecd]
./benchmark(_ZN3vex6detail4sortIN5boost6fusion14transform_viewIKNS3_6vectorIIRNS_6vectorImEEEEENS0_21extract_device_vectorENS3_5void_EEENS_4lessImE19vex_function_deviceEEEvRKNS_7backend4cuda13command_queueERT_T0_+0x9f) [0x4be38f]
./benchmark(_ZN3vex6detail9sort_sinkIN5boost6fusion6vectorIIRNS_6vectorImEEEEENS_4lessImEEEEvOT_T0_+0x80) [0x4becd0]
./benchmark(_Z14benchmark_sortIdEvRKN3vex7ContextERNS0_8profilerINSt6chrono3_V212system_clockEEE+0x189) [0x4bf049]
./benchmark(_Z9run_testsIdEvRKN3vex7ContextERNS0_8profilerINSt6chrono3_V212system_clockEEE+0x4be) [0x4c8efe]
./benchmark(main+0x745) [0x43d525]
/usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x2af97c343c05]
./benchmark() [0x43db52]

default_program(646): warning: declaration does not declare anything

default_program(657): error: union "Shared" has no member "keys0"

default_program(658): error: union "Shared" has no member "keys0"

default_program(716): error: union "Shared" has no member "keys0"

default_program(717): error: union "Shared" has no member "keys0"

4 errors detected in the compilation of "default_program".

*/vexcl/vexcl/backend/cuda/compiler_nvrtc.hpp:104
	NVRTC Error (6 - NVRTC_ERROR_COMPILATION)
^C

ddemidov avatar Dec 02 '18 18:12 ddemidov