pocl
pocl copied to clipboard
Some kernels make the compiler spend lots of time in BarrierTailReplication::FindBarriersDFS
This kernel:
https://gist.github.com/inducer/ba49ccf62e654b1947c0
takes a very long time to compile (on current pocl git). I've not seen it finish, but then I'm not very patient...
This is the backtrace when I interrupt it:
Program received signal SIGINT, Interrupt.
0x00007fffd82e1295 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1
(gdb) bt
#0 0x00007fffd82e1295 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1
#1 0x00007fffd82e1a22 in llvm::GraphTraits<llvm::BasicBlock*>::NodeType* llvm::Eval<llvm::GraphTraits<llvm::BasicBlock*> >(llvm::DominatorTreeBase<llvm::GraphTraits<llvm::BasicBlock*>::NodeType>&, llvm::GraphTraits<llvm::BasicBlock*>::NodeType*, unsigned int) () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1
#2 0x00007fffd82e5fb0 in void llvm::Calculate<llvm::Function, llvm::BasicBlock*>(llvm::DominatorTreeBase<llvm::GraphTraits<llvm::BasicBlock*>::NodeType>&, llvm::Function&) () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1
#3 0x00007fffd82e70c9 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1
#4 0x00007fffd82e774d in llvm::DominatorTreeWrapperPass::runOnFunction(llvm::Function&) () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1
#5 0x00007fffda4eda84 in pocl::BarrierTailReplication::ReplicateJoinedSubgraphs (this=0x2c93fb0, dominator=0x6c08900, subgraph_entry=0x6c08900, processed_bbs=std::set with 17 elements = {...}) at BarrierTailReplication.cc:242
#6 0x00007fffda4ed864 in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6c08900, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:162
#7 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6c08500, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#8 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6c1b810, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#9 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6bb81d0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#10 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b9dcc0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#11 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b9db90, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#12 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b459e0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#13 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b457b0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#14 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b9eb80, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#15 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b203d0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#16 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b1fda0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#17 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b1fc30, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#18 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b1e710, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#19 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b1e280, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#20 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b1deb0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#21 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6b1d9e0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#22 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66e5d80, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#23 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66e5a00, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#24 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66e58c0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#25 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66e5410, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#26 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66e5300, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#27 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66e5010, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#28 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66b2510, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#29 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66b20f0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#30 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66b1b90, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#31 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66e7b80, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#32 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x66e7fb0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#33 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656dd20, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#34 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6676bc0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#35 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6676a80, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#36 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6676670, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#37 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6676560, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#38 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6678b50, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#39 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6678a40, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#40 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6678790, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#41 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6675600, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#42 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x666c570, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#43 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x666c050, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#44 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x666be60, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#45 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x666b9c0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#46 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x65c1280, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#47 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x666d590, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#48 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x666a990, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#49 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656a8f0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#50 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x653f0c0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#51 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x653efc0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#52 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x653ecd0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#53 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656c7b0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#54 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656d2b0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#55 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656cee0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#56 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656ca10, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#57 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x652fcb0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#58 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x652f7d0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#59 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x652f5b0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#60 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656b390, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#61 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656b720, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#62 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x6530530, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#63 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2f11bf0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#64 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2f11ac0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#65 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2f11420, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#66 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2f11320, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#67 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656ef00, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#68 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x315a940, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#69 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x315a480, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#70 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x315b080, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#71 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x656e980, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#72 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x315acb0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#73 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2ee12a0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#74 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2f124a0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#75 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2f123d0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#76 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2e59900, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#77 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2e59830, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#78 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2ea68a0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#79 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2f14230, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#80 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2bcebd0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#81 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2bceb30, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#82 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2e00740, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#83 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x31ae3f0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#84 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2f19a80, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#85 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x56acc30, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#86 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2ea7350, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#87 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x56acb30, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#88 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x31582e0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#89 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x56ac8b0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#90 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2e5b3c0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#91 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x56ac660, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#92 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2c767c0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#93 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2c74260, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#94 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x2e7af20, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#95 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x5779c80, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
---Type <return> to continue, or q <return> to quit---
#96 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x5779b20, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#97 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x56ad470, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#98 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x56abdf0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#99 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x56ac080, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#100 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x3158420, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#101 0x00007fffda4ed8cd in pocl::BarrierTailReplication::FindBarriersDFS (this=0x2c93fb0, bb=0x57792e0, processed_bbs=std::set with 1823 elements = {...}) at BarrierTailReplication.cc:169
#102 0x00007fffda4ed7bd in pocl::BarrierTailReplication::ProcessFunction (this=0x2c93fb0, F=...) at BarrierTailReplication.cc:135
#103 0x00007fffda4ed6b9 in pocl::BarrierTailReplication::runOnFunction (this=0x2c93fb0, F=...) at BarrierTailReplication.cc:107
#104 0x00007fffd837c3a7 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1
#105 0x00007fffd837c42b in llvm::FPPassManager::runOnModule(llvm::Module&) () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1
#106 0x00007fffd837ee19 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /usr/lib/x86_64-linux-gnu/libLLVM-3.6.so.1
#107 0x00007fffda4c45b7 in pocl_llvm_generate_workgroup_function (device=0x2f8db30, kernel=0x2f36e20, local_x=16, local_y=16, local_z=1) at pocl_llvm_api.cc:1359
#108 0x00007fffda4a2db3 in POclEnqueueNDRangeKernel (command_queue=0x2f83390, kernel=0x2f36e20, work_dim=2, global_work_offset=0x0, global_work_size=0x7fffffffcd90, local_work_size=0x7fffffffcd70, num_events_in_wait_list=0, event_wait_list=0x0, event=0x7fffffffcbc8) at clEnqueueNDRangeKernel.c:240
#109 0x00007fffe7a56fbe in _call_func<int (*)(_cl_command_queue*, _cl_kernel*, unsigned int, unsigned long const*, unsigned long const*, unsigned long const*, unsigned int, _cl_event* const*, _cl_event**), 0, 1, 2, 3, 4, 5, 6, 7, 8, _cl_command_queue* const&, _cl_kernel* const&, unsigned int&, unsigned long const*&, unsigned long const*&, unsigned long const*&, unsigned long, _cl_event**, _cl_event**> (args=..., func=<optimized out>) at src/c_wrapper/function.h:38
#110 call_tuple<int (*&)(_cl_command_queue*, _cl_kernel*, unsigned int, unsigned long const*, unsigned long const*, unsigned long const*, unsigned int, _cl_event* const*, _cl_event**), std::tuple<_cl_command_queue* const&, _cl_kernel* const&, unsigned int&, unsigned long const*&, unsigned long const*&, unsigned long const*&, unsigned long, _cl_event**, _cl_event**> > (args=<optimized out>, func=<synthetic pointer>) at src/c_wrapper/function.h:49
#111 call<__CLArgGetter, int (*)(_cl_command_queue*, _cl_kernel*, unsigned int, unsigned long const*, unsigned long const*, unsigned long const*, unsigned int, _cl_event* const*, _cl_event**)> (func=<optimized out>, this=0x7fffffffcbf0) at src/c_wrapper/function.h:108
#112 clcall<int (*)(_cl_command_queue*, _cl_kernel*, unsigned int, unsigned long const*, unsigned long const*, unsigned long const*, unsigned int, _cl_event* const*, _cl_event**)> (name=0x7fffe7a5fc3e "clEnqueueNDRangeKernel", func=<optimized out>, this=0x7fffffffcbf0) at src/c_wrapper/error.h:186
#113 call_guarded<command_queue*&, kernel*&, unsigned int&, unsigned long const*&, unsigned long const*&, unsigned long const*&, pyopencl_buf<_cl_event*> const&, _CLObjOutArg<event>, _cl_command_queue*, _cl_kernel*, unsigned int, unsigned long const*, unsigned long const*, unsigned long const*, unsigned int, _cl_event* const*, _cl_event**> (
name=0x7fffe7a5fc3e "clEnqueueNDRangeKernel", func=<optimized out>) at src/c_wrapper/error.h:218
#114 <lambda()>::operator()(void) const (__closure=__closure@entry=0x7fffffffccd0) at src/c_wrapper/kernel.cpp:180
#115 0x00007fffe7a57457 in retry_mem_error<enqueue_nd_range_kernel(clbase**, clobj_t, clobj_t, cl_uint, const size_t*, const size_t*, const size_t*, clbase* const*, uint32_t)::<lambda()> > (func=...) at src/c_wrapper/error.h:294
#116 operator() (__closure=<synthetic pointer>) at src/c_wrapper/error.h:307
#117 c_handle_error<c_handle_retry_mem_error(Func&&) [with Func = enqueue_nd_range_kernel(clbase**, clobj_t, clobj_t, cl_uint, const size_t*, const size_t*, const size_t*, clbase* const*, uint32_t)::<lambda()>]::<lambda()> > (func=...) at src/c_wrapper/error.h:271
#118 c_handle_retry_mem_error<enqueue_nd_range_kernel(clbase**, clobj_t, clobj_t, cl_uint, const size_t*, const size_t*, const size_t*, clbase* const*, uint32_t)::<lambda()> > (func=<optimized out>) at src/c_wrapper/error.h:307
#119 enqueue_nd_range_kernel (evt=0x7fffd4720338, _queue=<optimized out>, _knl=<optimized out>, work_dim=2, global_work_offset=0x0, global_work_size=0x7fffffffcd90, local_work_size=0x7fffffffcd70, _wait_for=0x0, num_wait_for=0) at src/c_wrapper/kernel.cpp:181
#120 0x00007fffe7a01e73 in _cffi_f_enqueue_nd_range_kernel (self=<optimized out>, args=<optimized out>) at build/temp.linux-x86_64-2.7/pyopencl._cffi.cpp:4525
#121 0x00000000004c9ae5 in PyEval_EvalFrameEx ()
#122 0x00000000004c8329 in PyEval_EvalCodeEx ()
#123 0x00000000004c9ff6 in PyEval_EvalFrameEx ()
#124 0x00000000004e42d0 in ?? ()
#125 0x00000000004cd352 in PyEval_EvalFrameEx ()
#126 0x00000000004e42d0 in ?? ()
#127 0x0000000000502608 in ?? ()
#128 0x00000000004b34ce in PyObject_Call ()
#129 0x00000000005c815c in ?? ()
#130 0x00000000004cd352 in PyEval_EvalFrameEx ()
#131 0x00000000004c8329 in PyEval_EvalCodeEx ()
#132 0x00000000004c9ff6 in PyEval_EvalFrameEx ()
#133 0x00000000004c8329 in PyEval_EvalCodeEx ()
#134 0x00000000004cb577 in PyEval_EvalFrameEx ()
#135 0x00000000004c8329 in PyEval_EvalCodeEx ()
#136 0x000000000053627a in PyRun_StringFlags ()
#137 0x00000000004cf567 in PyEval_EvalFrameEx ()
#138 0x00000000004c8329 in PyEval_EvalCodeEx ()
#139 0x000000000050111f in ?? ()
#140 0x00000000004f6e22 in PyRun_FileExFlags ()
#141 0x00000000004f5f17 in PyRun_SimpleFileExFlags ()
#142 0x0000000000497afd in Py_Main ()
#143 0x00007ffff6f14b45 in __libc_start_main (main=0x497590 <main>, argc=3, argv=0x7fffffffe268, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe258) at libc-start.c:287
#144 0x00000000004974b8 in _start ()
I have some (bitrotted) work towards improving the parallel region analysis in a private branch which I need to clean up and upstream. It should speed up the analysis of trickier kernels. Hopefully I'll find time for working on it after the vacation period.
Note to myself, the pocl-standalone script still works. This can be reproduced with pocl-standalone -h stencil.h -o stencil.bc stencil.cl
Has there been any progress on this? I have a moderately complex kernel that just takes forever to compile. I've been waiting for several minutes now, stack trace still incredibly deep.
Did you try with master? There has been bunch of updates to kernel compilation which might affect also this.
I just tried with current master (instead of the version provided by Ubuntu Zesty) and the problem is still present.
OK, I replaced some conditional code with a branchless equivalent and now pocl is able to compile it. So the problem seems to be caused by too many conditionals inside loops (or something similar)...
I have a hunch what causes it, but unfortunately haven't had enough concentration time to fix it, sorry. I'd guess that adding extra barriers between the ifs workarounds it too.
Sure, no problem. My original code was not optimal anyway (it's targeting GPUs), so at least it forced me to improve it :)
Ping. Is there any news on this ? I recently encountered the same excessively long compile time with an OpenCL kernel of OpenCV. It took ~20sec to compile and has the same long backtrace on BarrierTailReplication::FindBarriersDFS
when interrupted.
Reduced kernel file that depicts the issue: https://gist.github.com/hominhquan/f488cabafe11b6c878a31c989b00f346
$ cpuinfo
...
Model name: AMD Ryzen 7 3700X 8-Core Processor
CPU MHz: 1862.848
CPU max MHz: 3600.0000
CPU min MHz: 2200.0000
...
$ POCL_DEBUG=1 time ./bin/poclcc objdetect_hog.cl
[2022-04-05 09:13:55.373080484]POCL: in fn pocl_init_devices at line 509:
| GENERAL | Installing SIGFPE handler...
[2022-04-05 09:13:55.378145574]POCL: in fn pocl_init_devices at line 458:
| GENERAL | FIRST INIT done; REINIT all devices
[2022-04-05 09:13:55.378168828]POCL: in fn compile_and_link_program at line 511:
| GENERAL | building program with options (null)
[2022-04-05 09:13:55.378178577]POCL: in fn compile_and_link_program at line 566:
| GENERAL | building from sources for device 0
[2022-04-05 09:14:16.350944598]POCL: in fn llvm_codegen at line 193:
| GENERAL | Linking final module
[2022-04-05 09:14:16.350963163]POCL: in fn pocl_run_command at line 1048:
| GENERAL | Launching: /usr/bin/ld
[2022-04-05 09:14:16.372203538]POCL: in fn pocl_check_dlhandle_cache at line 929:
| GENERAL | Using static WG size binary: /nfs/home/mqho/.cache/pocl/kcache/KF/NENAPBCFEOJBKMGNKAJHMPCKHPEDAHMKELKJA/normalize_hists_kernel/0-0-0/normalize_hists_kernel.so
[2022-04-05 09:14:16.375370631]POCL: in fn pocl_binary_serialize at line 628:
| GENERAL | serializing program.bc: /nfs/home/mqho/.cache/pocl/kcache/KF/NENAPBCFEOJBKMGNKAJHMPCKHPEDAHMKELKJA/program.bc
[2022-04-05 09:14:16.377452264]POCL: in fn serialize_kernel_cachedir at line 381:
| GENERAL | Kernel normalize_hists_kernel: recur serializing cachedir /nfs/home/mqho/.cache/pocl/kcache/KF/NENAPBCFEOJBKMGNKAJHMPCKHPEDAHMKELKJA/normalize_hists_kernel
[2022-04-05 09:14:16.380078641]POCL: in fn pocl_uninit_devices at line 352:
| GENERAL | UNINIT all devices
19.77user 1.16system 0:21.06elapsed 99%CPU (0avgtext+0avgdata 229408maxresident)k
The kernel pattern is likely one with lots of branches and barrier()
in each one of these if()
=> generate kind of exponential complexity in execution flow.
Could BarrierTailReplication perhaps limit the depth BarrierTailReplication
is called and fail on performing the transformation if the limit is reached?
Could BarrierTailReplication perhaps limit the depth
BarrierTailReplication
is called and fail on performing the transformation if the limit is reached?
Yes that's something that might be useful: Graceful failures from CFG analysis (corner)cases and fallback to "fiber"-based execution. I did something similar to this in the (now removed) GCCBRIG frontend and HSAIL. Of course it would be best to fix the actual reason for these failures, but I feel that there will be tricky corner cases in the future when attempting static CFG generation from a kind of "dynamic concept" the SPMD execution model is. That being said, I'm afraid me nor my group still do not have much time to help tracking this since the issue doesn't bother our urgent tasks, but I can provide feedback to any pull requests.
@pjaaskel Thanks, but it seems I was mistaken. I've just did a fast test to identify how much time was spent in this pass's runOnFunction
method compared to the overall compile-time. Only 0.33s out of 12s compile time was spent here, so the bottle-neck is actually elsewhere. I'll try to run it using perf to pin-point what is taking so long.
Can you try if the new WG method CBS handles these cases better? https://github.com/pocl/pocl/blame/master/doc/sphinx/source/using.rst#L353
I tried the latest release_3_1 (ac50b391), with all the WG methods (auto, loops, loopvec, repl, cbs). We are always around ~20s on an AMD Ryzen 7 3700X.
$ rm -rf build
$ mkdir -p build && cd build && cmake ..
$ make poclcc
$ for method in auto loops loopvec repl cbs; do echo "Trying $method"; POCL_KERNEL_CACHE=0 POCL_WORK_GROUP_METHOD=$method time ./bin/poclcc objdetect_hog.cl; done
Trying auto
18.18user 1.17system 0:19.39elapsed 99%CPU (0avgtext+0avgdata 229504maxresident)k
0inputs+2304outputs (0major+1772651minor)pagefaults 0swaps
Trying loops
18.61user 1.15system 0:19.80elapsed 99%CPU (0avgtext+0avgdata 230248maxresident)k
0inputs+2304outputs (0major+1773011minor)pagefaults 0swaps
Trying loopvec
19.26user 1.15system 0:20.44elapsed 99%CPU (0avgtext+0avgdata 229724maxresident)k
0inputs+2360outputs (0major+1593479minor)pagefaults 0swaps
Trying repl
18.73user 1.33system 0:20.10elapsed 99%CPU (0avgtext+0avgdata 229812maxresident)k
0inputs+2304outputs (0major+1772998minor)pagefaults 0swaps
Trying cbs
19.01user 1.08system 0:20.14elapsed 99%CPU (0avgtext+0avgdata 230352maxresident)k
0inputs+2304outputs (0major+1773016minor)pagefaults 0swaps
OK. But the kernel cache works well here, right, so it's only the cold run annoyance?
OK. But the kernel cache works well here, right, so it's only the cold run annoyance?
Yes, the kernel cache hit will return instantly. Only the cold compilation is long.