miniBUDE
miniBUDE copied to clipboard
OpenCL version segfaults on Intel NEO
Trying the current(https://github.com/UoB-HPC/bude-portability-benchmark/commit/37a6bd8f4b23da39973cfb2be3b7bd6798792782) OpenCL version on Intel UHD630 with the NEO CL driver from Intel produces a segfault:
Running OpenCL
[New Thread 0x7fffef5db700 (LWP 219939)]
Using device: Intel(R) Gen9 HD Graphics NEO
Thread 1 "bude-opencl" received signal SIGSEGV, Segmentation fault.
0x00007fffe0d989bf in clang::serialization::BasicReaderBase<clang::ASTRecordReader>::readDeclarationName() () from /lib64/../lib64/libclang-cpp.so.10
Missing separate debuginfos, use: dnf debuginfo-install clang-libs-10.0.1-2.fc32.x86_64 intel-gmmlib-20.2.2-1.fc32.x86_64 intel-igc-core-1.0.4241-1.fc32.x86_64 intel-igc-opencl-1.0.4241-1.fc32.x86_64 intel-opencl-20.28.17293-1.fc32.x86_64 intel-opencl-clang-10.0.12-1.fc32.x86_64 libedit-3.1-32.20191231cvs.fc32.x86_64 libffi-3.1-24.fc32.x86_64 libgcc-10.2.1-1.fc32.x86_64 libgomp-10.2.1-1.fc32.x86_64 libstdc++-10.2.1-1.fc32.x86_64 libva-2.7.1-1.fc32.x86_64 llvm-libs-10.0.1-4.fc32.x86_64 ncurses-libs-6.1-15.20191109.fc32.x86_64 nvidia-driver-cuda-libs-455.28-1.fc32.x86_64 ocl-icd-2.2.13-1.fc32.x86_64 spirv-llvm-translator-10.0.12-1.fc32.x86_64 zlib-1.2.11-21.fc32.x86_64
(gdb) backtrace
#0 0x00007fffe0d989bf in clang::serialization::BasicReaderBase<clang::ASTRecordReader>::readDeclarationName() () from /lib64/../lib64/libclang-cpp.so.10
#1 0x00007fffe0dd8c6e in clang::ASTDeclReader::VisitNamedDecl(clang::NamedDecl*) () from /lib64/../lib64/libclang-cpp.so.10
#2 0x00007fffe0dd9285 in clang::ASTDeclReader::VisitValueDecl(clang::ValueDecl*) () from /lib64/../lib64/libclang-cpp.so.10
#3 0x00007fffe0dd9319 in clang::ASTDeclReader::VisitDeclaratorDecl(clang::DeclaratorDecl*) () from /lib64/../lib64/libclang-cpp.so.10
#4 0x00007fffe0de94f7 in clang::ASTDeclReader::VisitFunctionDecl(clang::FunctionDecl*) () from /lib64/../lib64/libclang-cpp.so.10
#5 0x00007fffe0df07f6 in clang::ASTDeclReader::Visit(clang::Decl*) () from /lib64/../lib64/libclang-cpp.so.10
#6 0x00007fffe0df0c2b in clang::ASTReader::ReadDeclRecord(unsigned int) () from /lib64/../lib64/libclang-cpp.so.10
#7 0x00007fffe0d8da91 in clang::ASTReader::GetDecl(unsigned int) () from /lib64/../lib64/libclang-cpp.so.10
#8 0x00007fffe0db087e in clang::ASTReader::ReadASTBlock(clang::serialization::ModuleFile&, unsigned int) () from /lib64/../lib64/libclang-cpp.so.10
#9 0x00007fffe0dbade3 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, llvm::SmallVectorImpl<clang::ASTReader::ImportedSubmodule>*) () from /lib64/../lib64/libclang-cpp.so.10
#10 0x00007fffe0f1bd50 in clang::CompilerInstance::loadModuleFile(llvm::StringRef) () from /lib64/../lib64/libclang-cpp.so.10
#11 0x00007fffe0f5c49c in clang::FrontendAction::BeginSourceFile(clang::CompilerInstance&, clang::FrontendInputFile const&) () from /lib64/../lib64/libclang-cpp.so.10
#12 0x00007fffe0f13269 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () from /lib64/../lib64/libclang-cpp.so.10
#13 0x00007fffe0fcd12c in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) () from /lib64/../lib64/libclang-cpp.so.10
#14 0x00007fffe1ddf6b2 in Compile () from /lib64/libopencl-clang.so.10
#15 0x00007fffed498137 in TC::CClangTranslationBlock::TranslateClang(TC::TranslateClangArgs const*, TC::STB_TranslateOutputArgs*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char const*) () from /lib64/libigdfcl.so.1
#16 0x00007fffed499a5e in TC::CClangTranslationBlock::Translate(TC::STB_TranslateInputArgs const*, TC::STB_TranslateOutputArgs*) () from /lib64/libigdfcl.so.1
#17 0x00007fffed49d969 in IGC::FclOclTranslationCtx<0ul>::Impl::Translate(unsigned long, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, unsigned int) () from /lib64/libigdfcl.so.1
#18 0x00007ffff2153a7a in NEO::CompilerInterface::build(NEO::Device const&, NEO::TranslationInput const&, NEO::TranslationOutput&) ()
from /usr/lib64/intel-opencl/libigdrcl.so
#19 0x00007ffff1f9bbef in NEO::Program::build(unsigned int, _cl_device_id* const*, char const*, void (*)(_cl_program*, void*), void*, bool) ()
from /usr/lib64/intel-opencl/libigdrcl.so
#20 0x00007ffff1f3d7c8 in clBuildProgram () from /usr/lib64/intel-opencl/libigdrcl.so
#21 0x00007ffff7f80472 in clBuildProgram () from /lib64/libOpenCL.so.1
#22 0x0000000000402908 in initCL () at bude.c:674
#23 0x0000000000402a7f in runOpenCL (results=results@entry=0x4243a0) at bude.c:266
#24 0x000000000040130e in main (argc=<optimized out>, argv=<optimized out>) at bude.c:97
So the kernel compilation crashed at runtime, this looks like a CL runtime bug on Intel's side TBH.
For sanity, I've ran the exact same binary on a Nvidia Quadro P1000 and the result was correct:
./bude-opencl --device 1 -w 4 -p 1 -i 8
Running C/OpenMP
- Total time: 1699.10 ms
- Average time: 212.39 ms
- Interactions/s: 0.47 billion
- GFLOP/s: 19.29
Running OpenCL
Using device: Quadro P1000
- Total time: 642.43 ms
- Average time: 80.30 ms
- Interactions/s: 1.24 billion
- GFLOP/s: 51.03
OpenMP OpenCL (diff)
865.52 vs 865.52 ( 0.00%)
25.07 vs 25.07 ( 0.00%)
368.43 vs 368.43 ( 0.00%)
14.67 vs 14.67 ( 0.00%)
574.99 vs 574.99 ( 0.00%)
707.35 vs 707.35 ( 0.00%)
33.95 vs 33.95 ( 0.00%)
135.59 vs 135.59 ( 0.00%)
Largest difference was 0.000%
I suspect it's something to do with the VLA usage for etot
and friends.
This is also a problem for SYCL as we can't use VLA, will probably have to implement this with some sort of 2D local memory...
I suspect it's something to do with the VLA usage for etot and friends.
FWIW there are no VLA's in the OpenCL kernel. The NUM_TD_PER_THREAD
value is a preprocessor macro defined via clBuildProgram
compiler options, so is compile-time constant and therefore etot
is just a regular static array.
Sorry, SYCL doesn't require compilation so I immediately replace the macro NUM_TD_PER_THREAD
with the actual wgSize
during the port and then I confused myself thinking the original codebase uses VLA.