Enzyme
Enzyme copied to clipboard
Excessive Caching in neural network mk2
Hello,
I'm building an experimental library on top of Enzyme to provide things like neural networks directly in c++.
But as a first approximation it can be seen as a torture test for the Enzyme library, as I'll try to write the code in a pure and simple form (with very little external dependencies) but expect Enzyme to be able to differentiate it efficiently (I'll be trying to push the difficulties down to the compiler but in a way that he should be able to solve them).
This is a project for fun that I'm doing on my free time, I'll report enzyme related difficulties and desiderata I encounter but please consider them low priorities and don't put too much pressure on yourself for solving them.
https://github.com/GistNoesis/FunzymeAD
As an exercise I have already successfully written a simple neural network with non linear activations which trains on mnist with the CPU without batching the examples.
Everything works fine but even though I have tried my best to prevent memory allocation by providing pre-allocated buffers and using restrict everywhere I could, enzyme generate some remark: Caching instruction (It I skip the activations I manage to get 0 caching instruction as #257)
(Looking at the enzyme source code there seems to be a flag called enzyme-cache-never which I can pass as -mllvm -enzyme-cache-never=1 but it doesn't seem to have any effect)
I'm aiming for 0 extra memory allocation inside the tape (as this will make things easier to port to GPU, and allow lower memory footprint for higher order derivative, and allow usage on memory constrained embedded devices).
git clone https://github.com/GistNoesis/FunzymeAD
download and uncompress mnist dataset inside the data folder if you want to run it successfully
make testmnist && bin/testmnist
clang testmnist.cpp -lstdc++ -lm \
-Rpass=enzyme -Xclang -load -Xclang /usr/local/lib/ClangEnzyme-12.so \
-O2 -o bin/testmnist -fno-exceptions
In file included from testmnist.cpp:8:
./funzyme/cpulayers.hpp:112:13: remark: Load may need caching %8 = load double, double* %arrayidx12.i, align 8, !dbg !43, !tbaa !45, !noalias !47 due to store double %add13.i28, double* %arrayidx15.i29, align 8, !dbg !121, !tbaa !45, !alias.scope !105, !noalias !122 [-Rpass=enzyme]
temp += b[i];
^
./funzyme/cpulayers.hpp:110:17: remark: Load may need caching %10 = load double, double* %arrayidx.i, align 8, !dbg !58, !tbaa !45, !noalias !47 due to store double %add13.i28, double* %arrayidx15.i29, align 8, !dbg !121, !tbaa !45, !alias.scope !105, !noalias !122 [-Rpass=enzyme]
temp += W[i*m+j]*p[j];
^
./funzyme/cpulayers.hpp:36:14: remark: Load may need caching %12 = load double, double* %arrayidx25.i, align 8, !dbg !75, !tbaa !45, !alias.scope !78, !noalias !79 due to store double %add13.i28, double* %arrayidx15.i29, align 8, !dbg !121, !tbaa !45, !alias.scope !105, !noalias !122 [-Rpass=enzyme]
*out = *inp > 0.0 ? scale * (*inp) : scale * alpha * ( exp(*inp ) - 1) ;
^
./funzyme/cpulayers.hpp:112:13: remark: Load may need caching %21 = load double, double* %arrayidx12.i27, align 8, !dbg !116, !tbaa !45, !noalias !118 due to store double %add13.i72, double* %arrayidx15.i73, align 8, !dbg !201, !tbaa !45, !alias.scope !172, !noalias !202 [-Rpass=enzyme]
temp += b[i];
^
./funzyme/cpulayers.hpp:110:17: remark: Load may need caching %23 = load double, double* %arrayidx.i35, align 8, !dbg !128, !tbaa !45, !noalias !118 due to store double %add13.i72, double* %arrayidx15.i73, align 8, !dbg !201, !tbaa !45, !alias.scope !172, !noalias !202 [-Rpass=enzyme]
temp += W[i*m+j]*p[j];
^
./funzyme/cpulayers.hpp:110:26: remark: Load may need caching %24 = load double, double* %arrayidx8.i36, align 8, !dbg !129, !tbaa !45, !alias.scope !101, !noalias !130 due to store double %add13.i72, double* %arrayidx15.i73, align 8, !dbg !201, !tbaa !45, !alias.scope !172, !noalias !202 [-Rpass=enzyme]
temp += W[i*m+j]*p[j];
^
./funzyme/cpulayers.hpp:36:14: remark: Load may need caching %25 = load double, double* %arrayidx25.i43, align 8, !dbg !145, !tbaa !45, !alias.scope !147, !noalias !148 due to store double %add13.i72, double* %arrayidx15.i73, align 8, !dbg !201, !tbaa !45, !alias.scope !172, !noalias !202 [-Rpass=enzyme]
*out = *inp > 0.0 ? scale * (*inp) : scale * alpha * ( exp(*inp ) - 1) ;
^
./funzyme/cpulayers.hpp:110:17: remark: Caching instruction %11 = load double, double* %arrayidx.i, align 8, !dbg !47, !tbaa !48, !noalias !50 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
temp += W[i*m+j]*p[j];
^
./funzyme/cpulayers.hpp:36:14: remark: Caching instruction %21 = load double, double* %arrayidx25.i, align 8, !dbg !65, !tbaa !49, !alias.scope !68, !noalias !71 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
*out = *inp > 0.0 ? scale * (*inp) : scale * alpha * ( exp(*inp ) - 1) ;
^
./funzyme/cpulayers.hpp:110:26: remark: Caching instruction %37 = load double, double* %arrayidx8.i36, align 8, !dbg !113, !tbaa !50, !alias.scope !114, !noalias !115 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
temp += W[i*m+j]*p[j];
^
./funzyme/cpulayers.hpp:110:17: remark: Caching instruction %39 = load double, double* %arrayidx.i35, align 8, !dbg !112, !tbaa !50, !noalias !113 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
temp += W[i*m+j]*p[j];
^
./funzyme/cpulayers.hpp:36:14: remark: Caching instruction %51 = load double, double* %arrayidx25.i43, align 8, !dbg !127, !tbaa !50, !alias.scope !129, !noalias !132 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
*out = *inp > 0.0 ? scale * (*inp) : scale * alpha * ( exp(*inp ) - 1) ;
^
testMnist
Nbr of training images = 60000
Nbr of training labels = 60000
Nbr of test images = 10000
Nbr of test labels = 10000
Before training
net initialized
Starting epoch : 0
average epoch loss 0.558435
Starting epoch : 1
Thanks
I'll take a deeper look at this (and other posted issues) in about a week [have an aggressive paper deadline this week].
Thanks
Can you try passing the layer by value rather than reference here: https://github.com/GistNoesis/FunzymeAD/blob/0e0449b86a74037eb797116f75c28a2dc296ac85/funzyme/cpulayers.hpp#L97
Hello, I tried to pass it by value and it doesn't work. I have also tried to pass it via restricted pointer and it doesn't work either.