Tapir-LLVM icon indicating copy to clipboard operation
Tapir-LLVM copied to clipboard

Assertion `Returns.empty() && "Returns cloned when cloning loop."' failed.

Open stelleg opened this issue 7 years ago • 16 comments

When building the cilk version of a simple finite elements application: https://mantevo.org/downloads/miniFE_2.0.1.html

Have done a small amount of digging, but thought I'd submit an issue in case the fix is obvious to someone else. I'll keep digging in the meantime.

Error output:

In file included from main.cpp:47:
./driver.hpp:125:1: warning: Tapir loop not transformed: failed to use divide-and-conquer loop spawning [-Wpass-failed=loop-spawning]
driver(const Box& global_box, Box& my_box,
^
./driver.hpp:125:1: warning: Tapir loop not transformed: failed to use divide-and-conquer loop spawning [-Wpass-failed=loop-spawning]
clang-5.0: /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1226: virtual bool {anonymous}::DACLoopSpawning::processLoop(): Assertion `Returns.empty() && "Returns cloned when cloning loop."' failed.
#0 0x0000560f2563968f llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/george/tasks/tapir/parallel-ir/lib/Support/Unix/Signals.inc:398:0
#1 0x0000560f25639722 PrintStackTraceSignalHandler(void*) /home/george/tasks/tapir/parallel-ir/lib/Support/Unix/Signals.inc:462:0
#2 0x0000560f25637955 llvm::sys::RunSignalHandlers() /home/george/tasks/tapir/parallel-ir/lib/Support/Signals.cpp:49:0
#3 0x0000560f25638efb SignalHandler(int) /home/george/tasks/tapir/parallel-ir/lib/Support/Unix/Signals.inc:252:0
#4 0x00007fd26b932da0 __restore_rt (/usr/lib/libpthread.so.0+0x11da0)
#5 0x00007fd26a464860 __GI_raise (/usr/lib/libc.so.6+0x34860)
#6 0x00007fd26a465ec9 __GI_abort (/usr/lib/libc.so.6+0x35ec9)
#7 0x00007fd26a45d0bc __assert_fail_base (/usr/lib/libc.so.6+0x2d0bc)
#8 0x00007fd26a45d133 (/usr/lib/libc.so.6+0x2d133)
#9 0x0000560f26698ab3 (anonymous namespace)::DACLoopSpawning::processLoop() /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1226:0
#10 0x0000560f2669b598 (anonymous namespace)::LoopSpawningImpl::processLoop(llvm::Loop*) /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1686:0
#11 0x0000560f2669adf3 (anonymous namespace)::LoopSpawningImpl::run() /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1627:0
#12 0x0000560f2669bda3 (anonymous namespace)::LoopSpawning::runOnFunction(llvm::Function&) /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1818:0
#13 0x0000560f24f1f99a llvm::FPPassManager::runOnFunction(llvm::Function&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1514:0
#14 0x0000560f24f1fb3f llvm::FPPassManager::runOnModule(llvm::Module&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1535:0
#15 0x0000560f24f1fec7 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1591:0
#16 0x0000560f24f205f1 llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1694:0
#17 0x0000560f24f207e9 llvm::legacy::PassManager::run(llvm::Module&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1726:0
#18 0x0000560f258e5b09 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/CodeGen/BackendUtil.cpp:842:0
#19 0x0000560f258e7d2a clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/CodeGen/BackendUtil.cpp:1192:0
#20 0x0000560f264369c3 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/CodeGen/CodeGenAction.cpp:261:0
#21 0x0000560f271abe80 clang::ParseAST(clang::Sema&, bool, bool) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/Parse/ParseAST.cpp:161:0
#22 0x0000560f25f3a6fb clang::ASTFrontendAction::ExecuteAction() /home/george/tasks/tapir/parallel-ir/tools/clang/lib/Frontend/FrontendAction.cpp:1003:0
#23 0x0000560f26434784 clang::CodeGenAction::ExecuteAction() /home/george/tasks/tapir/parallel-ir/tools/clang/lib/CodeGen/CodeGenAction.cpp:993:0
#24 0x0000560f25f3a13e clang::FrontendAction::Execute() /home/george/tasks/tapir/parallel-ir/tools/clang/lib/Frontend/FrontendAction.cpp:906:0
#25 0x0000560f25ed6e38 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/Frontend/CompilerInstance.cpp:981:0
#26 0x0000560f260850f8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:251:0
#27 0x0000560f2358dca3 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /home/george/tasks/tapir/parallel-ir/tools/clang/tools/driver/cc1_main.cpp:221:0
#28 0x0000560f23582e5c ExecuteCC1Tool(llvm::ArrayRef<char const*>, llvm::StringRef) /home/george/tasks/tapir/parallel-ir/tools/clang/tools/driver/driver.cpp:306:0
#29 0x0000560f23583a38 main /home/george/tasks/tapir/parallel-ir/tools/clang/tools/driver/driver.cpp:387:0
#30 0x00007fd26a450f4a __libc_start_main (/usr/lib/libc.so.6+0x20f4a)
#31 0x0000560f2358053a _start (/home/george/tasks/tapir/build/bin/clang-5.0+0x1b6a53a)
Stack dump:
0.	Program arguments: /home/george/tasks/tapir/build/bin/clang-5.0 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -main-file-name main.cpp -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -momit-leaf-frame-pointer -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/george/tasks/mantevo/miniFE/miniFE-2.0_cilk/src/main.gcno -resource-dir /home/george/tasks/tapir/build/lib/clang/5.0.0 -I . -I ../utils -I ../fem -D MINIFE_SCALAR=double -D MINIFE_LOCAL_ORDINAL=int -D MINIFE_GLOBAL_ORDINAL=int -D MINIFE_CSR_MATRIX -D MINIFE_INFO=1 -D MINIFE_KERNELS=0 -internal-isystem /usr/lib64/gcc/x86_64-pc-linux-gnu/7.2.1/../../../../include/c++/7.2.1 -internal-isystem /usr/lib64/gcc/x86_64-pc-linux-gnu/7.2.1/../../../../include/c++/7.2.1/x86_64-pc-linux-gnu -internal-isystem /usr/lib64/gcc/x86_64-pc-linux-gnu/7.2.1/../../../../include/c++/7.2.1/backward -internal-isystem /usr/local/include -internal-isystem /home/george/tasks/tapir/build/lib/clang/5.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O2 -fdeprecated-macro -fdebug-compilation-dir /home/george/tasks/mantevo/miniFE/miniFE-2.0_cilk/src -ferror-limit 19 -fmessage-length 138 -ftapir=cilk -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o main.o -x c++ main.cpp 
1.	<eof> parser at end of file
2.	Per-module optimization passes
3.	Running pass 'Function Pass Manager' on module 'main.cpp'.
4.	Running pass 'Loop Spawning' on function '@_ZN6miniFE25generate_matrix_structureINS_9CSRMatrixIdiiEEEEiRKNS_23simple_mesh_descriptionINT_17GlobalOrdinalTypeEEERS4_'
clang-5.0: error: unable to execute command: Aborted (core dumped)
clang-5.0: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 5.0.0 ([email protected]:wsmoses/cilk-clang f81bbab46561384a0709c399c4cb0df9e4a3080b) ([email protected]:wsmoses/Parallel-IR.git f9d48f08baf80738b1501aa492df9b8dbd1521e6)
Target: x86_64-unknown-linux-gnu
Thread model: posix

stelleg avatar Dec 19 '17 18:12 stelleg

Apologies, was away on vacation. Quick question: is this using your clang frontend? Likewise would it be possible to post the IR at the start of loopspawning.

My guess is that there is a memory-related thing on the loop bounds.

wsmoses avatar Jan 10 '18 18:01 wsmoses

Using your cilk-clang master. Only happens with optimizations turned on -O2. Haven't had a chance to dig deeper, sorry. Here's the relevant .ll file with -O0: https://gist.github.com/stelleg/89f40a1d726600956d72ed580194fcbc

stelleg avatar Jan 10 '18 19:01 stelleg

Are you sure this is the latest version? [The hashes in your error don't seem to be in the X most recent commits and I tried compiling the ll file with the latest version and it seems fine up to a linker error.]

wsmoses avatar Jan 11 '18 19:01 wsmoses

Yep, getting the error running parallel-ir d89ba180c46fbe2fda5e2a5f595820bf4b75880d and cilk-clang 076e3106215e6a17a659ec1d015fdacf86f57ff2'

Strange that it's not failing when compiling the .ll file for you. Did you compile with -O2?

In the above error message I believe I was using my local version, thus the different hash, but I'm seeing the error with your latest heads as well.

stelleg avatar Jan 11 '18 20:01 stelleg

I managed to recreate this error. I'll try to dive into the issue and see what's going wrong.

neboat avatar Jan 11 '18 22:01 neboat

@neboat I found the problematic cilk_for loop. It's at SparseMatrix_functions.hpp:75. Replacing the cilk_for with a for fixes the compilation problem. I've tried replacing the termination condition with something slightly simpler, but it seems to be the body of the loop that's the issue.

stelleg avatar Jan 11 '18 22:01 stelleg

The problematic LLVM IR:

pfor.cond:                                        ; preds = %pfor.inc, %entry
  %10 = load i32, i32* %__begin, align 4
  %11 = load i32, i32* %__end, align 4
  %cmp = icmp slt i32 %10, %11
  br i1 %cmp, label %pfor.detach, label %pfor.end

pfor.detach:                                      ; preds = %pfor.cond
  %12 = load i32, i32* %__init, align 4
  %13 = load i32, i32* %__begin, align 4
  %mul = mul nsw i32 %13, 1
  %add2 = add nsw i32 %12, %mul
  detach within %syncreg, label %pfor.body.entry, label %pfor.inc

pfor.body.entry:                                  ; preds = %pfor.detach
  %i = alloca i32, align 4
  %exn.slot = alloca i8*
  %ehselector.slot = alloca i32
  store i32 %add2, i32* %i, align 4
  br label %pfor.body

pfor.body:                                        ; preds = %pfor.body.entry
  %14 = load i32, i32* %i, align 4
  invoke void @_ZN12MatrixInitOpIN6miniFE9CSRMatrixIdiiEEEclEi(%struct.MatrixInitOp* %mat_init, i32 %14)
          to label %invoke.cont unwind label %lpad

invoke.cont:                                      ; preds = %pfor.body
  br label %pfor.preattach

pfor.preattach:                                   ; preds = %invoke.cont
  reattach within %syncreg, label %pfor.inc

pfor.inc:                                         ; preds = %pfor.preattach, %pfor.detach
  %15 = load i32, i32* %__begin, align 4
  %inc = add nsw i32 %15, 1
  store i32 %inc, i32* %__begin, align 4
  br label %pfor.cond, !llvm.loop !4

lpad:                                             ; preds = %pfor.body
  %16 = landingpad { i8*, i32 }
          cleanup
  %17 = extractvalue { i8*, i32 } %16, 0
  store i8* %17, i8** %exn.slot, align 8
  %18 = extractvalue { i8*, i32 } %16, 1
  store i32 %18, i32* %ehselector.slot, align 4
  br label %det.rethrow

det.rethrow:                                      ; preds = %lpad
  br label %eh.resume

eh.resume:                                        ; preds = %det.rethrow
  %exn = load i8*, i8** %exn.slot, align 8
  %sel = load i32, i32* %ehselector.slot, align 4
  %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
  %lpad.val3 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
  resume { i8*, i32 } %lpad.val3

pfor.end:                                         ; preds = %pfor.cond
  sync within %syncreg, label %pfor.end.continue

pfor.end.continue:                                ; preds = %pfor.end
  ret void

stelleg avatar Jan 11 '18 22:01 stelleg

Thanks for the info. Is this the IR you get from -O2 compilation?

neboat avatar Jan 11 '18 22:01 neboat

Nope, couldn't get the Tapir IR from -O2 due to the error reported above. I believe the above is -O0. Here's -O1:

pfor.detach.preheader:                            ; preds = %entry
  br label %pfor.detach

pfor.cond.cleanup:                                ; preds = %pfor.inc, %entry
  sync within %syncreg, label %pfor.end.continue

pfor.end.continue:                                ; preds = %pfor.cond.cleanup
  call void @llvm.lifetime.end.p0i8(i64 88, i8* nonnull %0) #2
  ret void

pfor.detach:                                      ; preds = %pfor.detach.preheader, %pfor.inc
  %__begin.017 = phi i32 [ %inc, %pfor.inc ], [ 0, %pfor.detach.preheader ]
  detach within %syncreg, label %pfor.body, label %pfor.inc

pfor.body:                                        ; preds = %pfor.detach
  invoke void @_ZN12MatrixInitOpIN6miniFE9CSRMatrixIdiiEEEclEi(%struct.MatrixInitOp* nonnull %mat_init, i32 %__begin.017)
          to label %pfor.preattach unwind label %lpad

pfor.preattach:                                   ; preds = %pfor.body
  reattach within %syncreg, label %pfor.inc

pfor.inc:                                         ; preds = %pfor.preattach, %pfor.detach
  %inc = add nuw nsw i32 %__begin.017, 1
  %cmp = icmp slt i32 %inc, %1
  br i1 %cmp, label %pfor.detach, label %pfor.cond.cleanup, !llvm.loop !145

lpad:                                             ; preds = %pfor.body
  %2 = landingpad { i8*, i32 }
          cleanup
  call void @llvm.lifetime.end.p0i8(i64 88, i8* nonnull %0) #2
  resume { i8*, i32 } undef

stelleg avatar Jan 11 '18 22:01 stelleg

You think it has to do with the interaction between exception handling, i.e. invoke/resume, and Tapir?

stelleg avatar Jan 11 '18 22:01 stelleg

Ah, OK. (Sorry, I realized in hindsight that my question was silly.) Thanks again for the info. Now I know where to look.

I suspect there's some issue with Tapir and exceptions, but I don't think it's the resume. (Still, it would be nice to have an excuse to implement my master plan for integrating exceptions with Tapir...)

neboat avatar Jan 11 '18 22:01 neboat

I'm glad to hear there's a master plan :). Seems like getting tapir and exceptions to play nicely could be non-trivial.

stelleg avatar Jan 11 '18 22:01 stelleg

OK, I think I've identified the problem, and it is with exception handling. A quick work around (that preserves the cilk_for you identified before) is to add __attribute__((noinline)) to init_matrix in SparseMatrix_functions.hpp. I'm thinking through a better fix, but hopefully this change will still let you enjoy some Cilk parallelism in your code.

neboat avatar Jan 11 '18 23:01 neboat

Thanks for the workaround, did the job for me.

stelleg avatar Jan 12 '18 18:01 stelleg

Wanted to give you a quick update. I'm currently testing a fix to Tapir's integration with exception-handling code. On my machine, I can now successfully build this test case without any work around. I would like to try running this test case and to try running the race detector on it. Can you please advise me on how to run this program?

neboat avatar Jan 27 '18 18:01 neboat

Nice! Assuming you've successfully built it, it should be miniFE.x in the src subdirectory. You can run it with no arguments to get a trivially small test case, or if you want something longer running, you can increase the size of the problem, e.g. ./miniFE.x --nx 100 --ny 100 --nz 100. Let me know if you have any issues.

stelleg avatar Jan 28 '18 05:01 stelleg