Enzyme icon indicating copy to clipboard operation
Enzyme copied to clipboard

Excessive Caching in neural network mk2

Open unrealwill opened this issue 4 years ago • 4 comments

Hello,

I'm building an experimental library on top of Enzyme to provide things like neural networks directly in c++.

But as a first approximation it can be seen as a torture test for the Enzyme library, as I'll try to write the code in a pure and simple form (with very little external dependencies) but expect Enzyme to be able to differentiate it efficiently (I'll be trying to push the difficulties down to the compiler but in a way that he should be able to solve them).

This is a project for fun that I'm doing on my free time, I'll report enzyme related difficulties and desiderata I encounter but please consider them low priorities and don't put too much pressure on yourself for solving them.

https://github.com/GistNoesis/FunzymeAD

As an exercise I have already successfully written a simple neural network with non linear activations which trains on mnist with the CPU without batching the examples.

Everything works fine but even though I have tried my best to prevent memory allocation by providing pre-allocated buffers and using restrict everywhere I could, enzyme generate some remark: Caching instruction (It I skip the activations I manage to get 0 caching instruction as #257)

(Looking at the enzyme source code there seems to be a flag called enzyme-cache-never which I can pass as -mllvm -enzyme-cache-never=1 but it doesn't seem to have any effect)

I'm aiming for 0 extra memory allocation inside the tape (as this will make things easier to port to GPU, and allow lower memory footprint for higher order derivative, and allow usage on memory constrained embedded devices).

git clone https://github.com/GistNoesis/FunzymeAD

download and uncompress mnist dataset inside the data folder if you want to run it successfully

make testmnist && bin/testmnist

clang  testmnist.cpp  -lstdc++ -lm \
	 									-Rpass=enzyme -Xclang -load -Xclang /usr/local/lib/ClangEnzyme-12.so \
	  									 -O2  -o bin/testmnist -fno-exceptions
In file included from testmnist.cpp:8:
./funzyme/cpulayers.hpp:112:13: remark: Load may need caching   %8 = load double, double* %arrayidx12.i, align 8, !dbg !43, !tbaa !45, !noalias !47 due to   store double %add13.i28, double* %arrayidx15.i29, align 8, !dbg !121, !tbaa !45, !alias.scope !105, !noalias !122 [-Rpass=enzyme]
    temp += b[i];
            ^
./funzyme/cpulayers.hpp:110:17: remark: Load may need caching   %10 = load double, double* %arrayidx.i, align 8, !dbg !58, !tbaa !45, !noalias !47 due to   store double %add13.i28, double* %arrayidx15.i29, align 8, !dbg !121, !tbaa !45, !alias.scope !105, !noalias !122 [-Rpass=enzyme]
       temp +=  W[i*m+j]*p[j];
                ^
./funzyme/cpulayers.hpp:36:14: remark: Load may need caching   %12 = load double, double* %arrayidx25.i, align 8, !dbg !75, !tbaa !45, !alias.scope !78, !noalias !79 due to   store double %add13.i28, double* %arrayidx15.i29, align 8, !dbg !121, !tbaa !45, !alias.scope !105, !noalias !122 [-Rpass=enzyme]
      *out = *inp > 0.0 ? scale * (*inp) : scale * alpha * ( exp(*inp ) - 1) ;
             ^
./funzyme/cpulayers.hpp:112:13: remark: Load may need caching   %21 = load double, double* %arrayidx12.i27, align 8, !dbg !116, !tbaa !45, !noalias !118 due to   store double %add13.i72, double* %arrayidx15.i73, align 8, !dbg !201, !tbaa !45, !alias.scope !172, !noalias !202 [-Rpass=enzyme]
    temp += b[i];
            ^
./funzyme/cpulayers.hpp:110:17: remark: Load may need caching   %23 = load double, double* %arrayidx.i35, align 8, !dbg !128, !tbaa !45, !noalias !118 due to   store double %add13.i72, double* %arrayidx15.i73, align 8, !dbg !201, !tbaa !45, !alias.scope !172, !noalias !202 [-Rpass=enzyme]
       temp +=  W[i*m+j]*p[j];
                ^
./funzyme/cpulayers.hpp:110:26: remark: Load may need caching   %24 = load double, double* %arrayidx8.i36, align 8, !dbg !129, !tbaa !45, !alias.scope !101, !noalias !130 due to   store double %add13.i72, double* %arrayidx15.i73, align 8, !dbg !201, !tbaa !45, !alias.scope !172, !noalias !202 [-Rpass=enzyme]
       temp +=  W[i*m+j]*p[j];
                         ^
./funzyme/cpulayers.hpp:36:14: remark: Load may need caching   %25 = load double, double* %arrayidx25.i43, align 8, !dbg !145, !tbaa !45, !alias.scope !147, !noalias !148 due to   store double %add13.i72, double* %arrayidx15.i73, align 8, !dbg !201, !tbaa !45, !alias.scope !172, !noalias !202 [-Rpass=enzyme]
      *out = *inp > 0.0 ? scale * (*inp) : scale * alpha * ( exp(*inp ) - 1) ;
             ^
./funzyme/cpulayers.hpp:110:17: remark: Caching instruction   %11 = load double, double* %arrayidx.i, align 8, !dbg !47, !tbaa !48, !noalias !50 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
       temp +=  W[i*m+j]*p[j];
                ^
./funzyme/cpulayers.hpp:36:14: remark: Caching instruction   %21 = load double, double* %arrayidx25.i, align 8, !dbg !65, !tbaa !49, !alias.scope !68, !noalias !71 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
      *out = *inp > 0.0 ? scale * (*inp) : scale * alpha * ( exp(*inp ) - 1) ;
             ^
./funzyme/cpulayers.hpp:110:26: remark: Caching instruction   %37 = load double, double* %arrayidx8.i36, align 8, !dbg !113, !tbaa !50, !alias.scope !114, !noalias !115 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
       temp +=  W[i*m+j]*p[j];
                         ^
./funzyme/cpulayers.hpp:110:17: remark: Caching instruction   %39 = load double, double* %arrayidx.i35, align 8, !dbg !112, !tbaa !50, !noalias !113 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
       temp +=  W[i*m+j]*p[j];
                ^
./funzyme/cpulayers.hpp:36:14: remark: Caching instruction   %51 = load double, double* %arrayidx25.i43, align 8, !dbg !127, !tbaa !50, !alias.scope !129, !noalias !132 legalRecompute: 0 shouldRecompute: 0 tryLegalRecomputeCheck: 1 [-Rpass=enzyme]
      *out = *inp > 0.0 ? scale * (*inp) : scale * alpha * ( exp(*inp ) - 1) ;
             ^
testMnist 
Nbr of training images = 60000
Nbr of training labels = 60000
Nbr of test images = 10000
Nbr of test labels = 10000
Before training
net initialized 
Starting epoch : 0
average epoch loss 0.558435
Starting epoch : 1

Thanks

unrealwill avatar Aug 09 '21 12:08 unrealwill

I'll take a deeper look at this (and other posted issues) in about a week [have an aggressive paper deadline this week].

wsmoses avatar Aug 10 '21 15:08 wsmoses

Thanks

unrealwill avatar Aug 11 '21 06:08 unrealwill

Can you try passing the layer by value rather than reference here: https://github.com/GistNoesis/FunzymeAD/blob/0e0449b86a74037eb797116f75c28a2dc296ac85/funzyme/cpulayers.hpp#L97

wsmoses avatar Aug 17 '21 00:08 wsmoses

Hello, I tried to pass it by value and it doesn't work. I have also tried to pass it via restricted pointer and it doesn't work either.

unrealwill avatar Aug 17 '21 09:08 unrealwill