Enzyme icon indicating copy to clipboard operation
Enzyme copied to clipboard

What is the best way to integrate Enzyme into a large project?

Open zihay opened this issue 3 years ago • 3 comments

Hi, I am using Enzyme, specifically the linker plugin LLDEnzyme-14.so, in my project. I called the __enzyme_autodiff() at the top level of the project hierarchy. I found that every time I made a small change in a file, Enzyme will rerun on the whole project. As the project gets bigger and bigger, this process could take several minutes. I’m wondering if there is way to apply Enzyme at each compilation unit, and generate the reverse pass for every marked functions. Then at link time, these reversed functions are linked together. If this could work, the build system could cache some of the targets and we don’t need to rebuild everything. Thanks!

zihay avatar Jun 30 '22 19:06 zihay

Caching (and compile times with Enzyme in General) are currently indeed not too great. I work on differentiating a large (30k loc) but simple function hitting an unfortunate corner-case and with Enzyme the compile times went from 6s with lto to 6hrs with lto + Enzyme. The only optimization for a large project I am currently aware of is re-ordering your filestructure (as far as possible) to run Enzyme only on a subpart and compile the rest of your project independently. There is a bit of ongoing work which might benefit the compile times in general and possibly (?) even help with running Enzyme multithreaded in a few months. However, there is currently no one working on differentiating across compilation units and iirc. it also isn't that trivial to get correct. Maybe wsmoses can chime in and give an overview of which changes are necessary.

ZuseZ4 avatar Jun 30 '22 22:06 ZuseZ4

I think the nicest way to do exactly what you're saying @haianyt is do use the custom derivative interface, like explained for CUDA code here: https://enzyme.mit.edu/getting_started/CUDAGuide/ (see Heterogeneous AD for a description)

A big part of the LTO approach is that LTO just takes forever to run (often regardless of Enzyme). For example linking Clang with LTO takes an hour alone!

I think it would probably be useful to make nice syntactic sugar for doing a header-style derivative custom derivative (e.g. like in the CUDA above, except that's done by hand)

wsmoses avatar Jul 01 '22 15:07 wsmoses

Thanks @ZuseZ4 and @wsmoses. I tried the method mentioned by @wsmoses, it worked in my project. The build system can also parallel the differentiation pass on the file level. However, it requires a lot of boilerplate code to make it work. A syntactic sugar will definitely make the code cleaner!

zihay avatar Jul 05 '22 17:07 zihay