Enzyme
Enzyme copied to clipboard
`No reverse pass found for function` error in multi file program
The library I am trying to compile has following structure:
- BaseClass with two public functions
void operation
andvirtual void actual_operation
- A child class ImplementationClass, which implements
void actual_operation
-
wrapper
function takes necessary inputs and ImplementationClass reference and callsImplementationClass.operation()
, which callsactual_operation
down the line.
When everything is in a single file, the program compiles fine with enzyme. But when I split each class in its own file, enzyme gives me following error:
No reverse pass found for _ZN9BaseClass9operationEddR5Point
declare dso_local void @_ZN9BaseClass9operationEddR5Point(%class.BaseClass* nonnull dereferenceable(24), double, double, %class.Point* nonnull align 8 dereferenceable(16)) local_unnamed_addr #0
UNREACHABLE executed at ../Enzyme/EnzymeLogic.cpp:3352!
Files to reproduce I will append in following comments
Full stack trace:
No reverse pass found for _ZN9BaseClass9operationEddR5Point
declare dso_local void @_ZN9BaseClass9operationEddR5Point(%class.BaseClass* nonnull dereferenceable(24), double, double, %class.Point* nonnull align 8 dereferenceable(16)) local_unnamed_addr #0
UNREACHABLE executed at ../Enzyme/EnzymeLogic.cpp:3352!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: /opt/llvm_12/clang_12_prebuilt/bin/clang-12 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj --mrelax-relocations -disable-free -disable-llvm-verifier -discard-value-names -main-file-name main.cpp -mrelocation-model static -mframe-pointer=none -fmath-errno -fno-rounding-math -mconstructor-aliases -munwind-tables -target-cpu x86-64 -tune-cpu generic -fno-split-dwarf-inlining -debugger-tuning=gdb -resource-dir /opt/llvm_12/clang_12_prebuilt/lib/clang/12.0.1 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/x86_64-linux-gnu/c++/9 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/x86_64-linux-gnu/c++/9 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/backward -internal-isystem /usr/local/include -internal-isystem /opt/llvm_12/clang_12_prebuilt/lib/clang/12.0.1/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -fdeprecated-macro -fdebug-compilation-dir /home/amit/Projects/COLABFIT/misc/dscribe_libdescriptor/dscribe/dscribe/enzyme_class_heirarchy_bug -ferror-limit 19 -fgnuc-version=4.2.1 -fcxx-exceptions -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -load /opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so -faddrsig -o /tmp/main-373ff2.o -x c++ main.cpp
1. <eof> parser at end of file
2. Per-module optimization passes
3. Running pass 'Enzyme Pass' on module 'main.cpp'.
#0 0x00000000024c18a3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x24c18a3)
#1 0x00000000024bf7ee llvm::sys::RunSignalHandlers() (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x24bf7ee)
#2 0x00000000024c1d4f SignalHandler(int) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x24c1d4f)
#3 0x00007ff671a973c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x143c0)
#4 0x00007ff67157603b raise /build/glibc-sMfBJT/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
#5 0x00007ff671555859 abort /build/glibc-sMfBJT/glibc-2.31/stdlib/abort.c:81:7
#6 0x0000000002450e21 (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x2450e21)
#7 0x00007ff6710cef8c EnzymeLogic::CreatePrimalAndGradient(ReverseCacheKey const&&, TypeAnalysis&, AugmentedReturn const*, bool) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x58bf8c)
#8 0x00007ff6711e6676 AdjointGenerator<AugmentedReturn const*>::visitCallInst(llvm::CallInst&) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x6a3676)
#9 0x00007ff6711cbb4b llvm::InstVisitor<AdjointGenerator<AugmentedReturn const*>, void>::delegateCallInst(llvm::CallInst&) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x688b4b)
#10 0x00007ff6711ba83d llvm::InstVisitor<AdjointGenerator<AugmentedReturn const*>, void>::visitCall(llvm::CallInst&) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x67783d)
#11 0x00007ff6711b9cfa llvm::InstVisitor<AdjointGenerator<AugmentedReturn const*>, void>::visit(llvm::Instruction&) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x676cfa)
#12 0x00007ff6710eb06d llvm::InstVisitor<AdjointGenerator<AugmentedReturn const*>, void>::visit(llvm::Instruction*) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x5a806d)
#13 0x00007ff6710d0e65 EnzymeLogic::CreatePrimalAndGradient(ReverseCacheKey const&&, TypeAnalysis&, AugmentedReturn const*, bool) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x58de65)
#14 0x00007ff67109febc (anonymous namespace)::Enzyme::HandleAutoDiff(llvm::CallInst*, llvm::TargetLibraryInfo&, DerivativeMode, bool) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x55cebc)
#15 0x00007ff67109cadb (anonymous namespace)::Enzyme::lowerEnzymeCalls(llvm::Function&, bool&, std::set<llvm::Function*, std::less<llvm::Function*>, std::allocator<llvm::Function*> >&) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x559adb)
#16 0x00007ff67109725e (anonymous namespace)::Enzyme::runOnModule(llvm::Module&) (/opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so+0x55425e)
#17 0x0000000001e6d08f llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x1e6d08f)
#18 0x000000000269d3b5 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x269d3b5)
#19 0x000000000319415f clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x319415f)
#20 0x0000000002c1e8ec clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x2c1e8ec)
#21 0x0000000003be7554 clang::ParseAST(clang::Sema&, bool, bool) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x3be7554)
#22 0x0000000002bea627 clang::FrontendAction::Execute() (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x2bea627)
#23 0x0000000002b75151 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x2b75151)
#24 0x0000000002c7f38c clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0x2c7f38c)
#25 0x0000000000a06e72 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0xa06e72)
#26 0x0000000000a056b7 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0xa056b7)
#27 0x0000000000a0544b main (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0xa0544b)
#28 0x00007ff6715570b3 __libc_start_main /build/glibc-sMfBJT/glibc-2.31/csu/../csu/libc-start.c:342:3
#29 0x0000000000a023d9 _start (/opt/llvm_12/clang_12_prebuilt/bin/clang-12+0xa023d9)
clang-12: error: unable to execute command: Aborted (core dumped)
clang-12: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 12.0.1
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/llvm_12/clang_12_prebuilt/bin
clang-12: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-12: note: diagnostic msg: /tmp/main-37dfcb.cpp
clang-12: note: diagnostic msg: /tmp/ic-bf3ee2.cpp
clang-12: note: diagnostic msg: /tmp/base-1042cb.cpp
clang-12: note: diagnostic msg: /tmp/point-da5e0c.cpp
clang-12: note: diagnostic msg: /tmp/main-37dfcb.sh
clang-12: note: diagnostic msg:
********************
I can provide a tar file with Cmake file of failing build if needed
All files are also contained with in the tarball linked in the end.
Single file which compiles fine single_file_main.cpp
:
#include <iostream>
int enzyme_dup;
int enzyme_const;
int enzyme_out;
class Point{
public:
double x;
double y;
Point(double x, double y): x(x), y(y){}
Point():x(0.0),y(0.0){}
Point operator+(const Point& z){
Point P(0.,0.);
P.x = this->x + z.x;
P.y = this->y + z.y;
return P;
}
};
class BaseClass{
public:
double x2, y2;
BaseClass(double x2, double y2):x2(x2), y2(y2) {}
void operation(double x1, double y1, Point &P2){
Point P1 = Point(x1,y1);
this->actual_operation(P1, P2);
}
virtual void actual_operation(Point &P1, Point &P2) = 0;
};
class ImplementedClass: public BaseClass{
public:
ImplementedClass(double x, double y):BaseClass(x, y){}
void actual_operation(Point &P1, Point &P2);
};
void ImplementedClass::actual_operation(Point &P1, Point &P2){
P2.x = P1.x + this->x2;
P2.y = P1.y + this->y2;
};
void wrapper(double x, double y, Point &p1, ImplementedClass &IC1){
IC1.operation(x, y, p1);
}
void __enzyme_autodiff(void(*)(double, double, Point&, ImplementedClass &), int, double, double, int, double, double, int, Point &, Point &, int, ImplementedClass&);
int main()
{
double x = 2.0;
double y= 3.0;
Point p1(0.,0.);
ImplementedClass ic1(3.,4.);
ic1.operation(x, y, p1);
wrapper(x, y, p1, ic1);
double d_x = 2.0;
double d_y= 3.0;
Point d_p1(0.,0.);
__enzyme_autodiff(wrapper, enzyme_dup, x, d_x, enzyme_dup, y, d_y, enzyme_dup, p1, d_p1, enzyme_const, ic1);
return 0;
}
Compilation:
clang++ single_file_main.cpp -Xclang -load -Xclang /opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so -O3 -o a.out
Multiple files:
1.point.h
and point.cpp
class Point{
public:
double x;
double y;
Point(double x, double y);
Point();
Point operator+(const Point& z);
};
#include "point.h"
Point::Point(double x, double y): x(x), y(y){}
Point::Point():x(0.0),y(0.0){}
Point Point::operator+(const Point& z){
Point P(0.,0.);
P.x = this->x + z.x;
P.y = this->y + z.y;
return P;
}
-
base.h
andbase.cpp
#include "point.h"
class BaseClass{
public:
double x2, y2;
BaseClass(double x2, double y2);
void operation(double x1, double y1, Point &P2);
virtual void actual_operation(Point &P1, Point &P2) = 0;
};
#include "base.h"
BaseClass::BaseClass(double x2, double y2):x2(x2), y2(y2) {}
void BaseClass::operation(double x1, double y1, Point &P2){
Point P1 = Point(x1,y1);
this->actual_operation(P1, P2);
}
-
ic.h
andic.cpp
#include "base.h"
class ImplementedClass: public BaseClass{
public:
ImplementedClass(double x, double y);
void actual_operation(Point &P1, Point &P2);
};
#include "ic.h"
ImplementedClass::ImplementedClass(double x, double y):BaseClass(x, y){}
void ImplementedClass::actual_operation(Point &P1, Point &P2){
P2.x = P1.x + this->x2;
P2.y = P1.y + this->y2;
};
-
main.cpp
#include <iostream>
#include "ic.h"
int enzyme_dup;
int enzyme_const;
int enzyme_out;
void wrapper(double x, double y, Point &p1, ImplementedClass &IC1){
IC1.operation(x, y, p1);
}
void __enzyme_autodiff(void(*)(double, double, Point&, ImplementedClass &), int, double, double, int, double, double, int, Point &, Point &, int, ImplementedClass&);
int main()
{
double x = 2.0;
double y= 3.0;
Point p1(0.,0.);
ImplementedClass ic1(3.,4.);
ic1.operation(x, y, p1);
wrapper(x, y, p1, ic1);
double d_x = 2.0;
double d_y= 3.0;
Point d_p1(0.,0.);
__enzyme_autodiff(wrapper, enzyme_dup, x, d_x, enzyme_dup, y, d_y, enzyme_dup, p1, d_p1, enzyme_const, ic1);
return 0;
}
Compiled as:
clang++ main.cpp ic.cpp base.cpp point.cpp -Xclang -load -Xclang /opt/enzyme/enzyme/build/Enzyme/ClangEnzyme-12.so -O3 -o a.out
Thank you. enzyme_class_heirarchy_bug.tar.gz
I currently don't have a PC at hand to look up the precise flags, but for multiple files we usually use lto to embed and merge the bitcode. Afterwards it is easy to run Enzyme as a linker Plugin, so you would use LLDEnzyme.so istead of ClangEnzyme.so. Let me see if I can come up with the actual command to compile your example.
Something along these lines should work:
CXX := /home/zusez4/.cache/enzyme/rustc-1.59.0-src/build/x86_64-unknown-linux-gnu/llvm/bin/clang++
CC := /home/zusez4/.cache/enzyme/rustc-1.59.0-src/build/x86_64-unknown-linux-gnu/llvm/bin/clang
LLDEnzyme := /home/zusez4/prog/Enzyme/enzyme/build/Enzyme/LLDEnzyme-13.so
CXXFLAGS += -fuse-ld=lld -flto
LDFLAGS += -fuse-ld=lld -flto
EnzymeNoOpt := -fno-vectorize -fno-slp-vectorize -fno-unroll-loops
bench: benchFunctions.cpp
$(CXX) -g $(CXXFLAGS) -c $(EnzymeNoOpt) $(INCLUDE_FLAGS) -o rb_b.o benchFunctions.cpp
echo "Compiling done, now linking\n\n"
$(CXX) -g $(CXXFLAGS) rb_b.o $(LDFLAGS) -o rb_b -Wl,--lto-legacy-pass-manager -Wl,-mllvm=-load=$(LLDEnzyme) -Wl,-save-temps
echo "done"
Hi, I used your example to get following makefile which does compile my example successfully
CXX := clang++
CC := clang
LLDEnzyme := /opt/enzyme/enzyme/build/Enzyme/LLDEnzyme-12.so
CXXFLAGS += -fuse-ld=lld -flto
LDFLAGS += -fuse-ld=lld -flto
EnzymeNoOpt := -fno-vectorize -fno-slp-vectorize -fno-unroll-loops
main: main.o ic.o
$(CXX) -O3 $(CXXFLAGS) ic.o main.o $(LDFLAGS) -o main.x -Wl,--lto-legacy-pass-manager -Wl,-mllvm=-load=$(LLDEnzyme) -Wl,-save-temps
ic.o: ic.cpp
$(CXX) -O3 $(CXXFLAGS) -c $(EnzymeNoOpt) $(INCLUDE_FLAGS) -o ic.o ic.cpp
main.o: main.cpp
$(CXX) -O3 $(CXXFLAGS) -c $(EnzymeNoOpt) $(INCLUDE_FLAGS) -o main.o main.cpp
Just a quick question, why are -fno-vectorize -fno-slp-vectorize -fno-unroll-loops
flags needed? I removed them and still it compiled fine. Does passing it to Clang not emit un-optimized code? (I remember earlier, before EnzymeClang.so, calling opt
explicitly for optimization)
Also for some reason Enzyme hangs indefinitely during my actual library compilation, so I need to isolate problem there as well. :|
Hi, those flags are, as you noted not necessary for Enzyme. Experiments just showed that Enzyme will generate slightly better code if it's input has not been vectorized. (Other optimizations are however important). We also optimize the output generated by Enzyme again, during this second optimization we do allow vectorization and unrolling.
Wrt. the hanging compilation. I had a simple testcase that went from a few seconds to 8 Min compile time just with LTO. How lang did you leave it running? It could also be some inefficient Path in Enzyme, so if you can isolate that it probably could help to find the reason.
I usually kill it after 30 seconds or so. Will try running it longer. Just spotted one mistake in my Makefile so working on it. Hopefully it would be ok.
I tried for 30 min, still not complete! But I found the offending line, its a nested vector datastructure vector<vector<vector<vector<int>>>> bins
called as
this->bins = vector<vector<vector<vector<int>>>>(this->nx, vector<vector<vector<int>>>(this->ny, vector<vector<int>>(this->nz, vector<int>())));
If I remove this line it compiles fine. Would try and refactor it later. Thanks.
@wsmoses A comparably small example, assuming that the time comes from an Enzyme analysis, this is probably a good start to spot another n^2 issue?
Here is a minimal working example that hangs on my machine (Clang 12, Enzyme : 5989d4975, Apr 17th)
#include <iostream>
#include <vector>
using namespace std;
int enzyme_out;
int enzyme_dup;
int enzyme_const;
typedef int vec3[3];
void nested_vec(double *x)
{
int nx, ny, nz;
nx = 2; ny = 2; nz = 2;
auto y = vector<vector<vector<vector<int>>>>(nx, vector<vector<vector<int>>>(ny, vector<vector<int>>(nz, vector<int>())));
y[0][0][0].push_back(0);
*x *= y[0][0][0][0];
}
void __enzyme_autodiff(void (*)(double *),int , double *, double *);
int main(){
double x = 1.;
double dx = 1.;
__enzyme_autodiff(nested_vec, enzyme_dup, &x, &dx);
return 0;
}
Also copied in compiler explorer here: https://fwd.gymni.ch/86w52h