horde-ad icon indicating copy to clipboard operation
horde-ad copied to clipboard

[Feature] I can write code with large tensors, and the derivative runs as fast as PyTorch on GPU

Open Mikolaj opened this issue 2 years ago • 11 comments

Desiderata:

  • [ ] MNIST example with MatMul only
  • [ ] MNIST example with convolutions
  • [ ] ...
  • [ ] GPT-3 on 64 GPUs

Supposedly, this can be done rather cheaply, offloading most of the work to LLVM and Haskell packages working with it.

My loose notes from a chat with Alp Mestanogullari:

you can always write instances for ASTs and JIT-compile algorithms built on top of your differentiation mechanics

https://github.com/llvm-hs/llvm-hs-examples/blob/master/arith/Arith.hs

it's essentially scalar functions of just one (scalar) variable

and you can keep in mind that this could just be taken down to the GPU path instead with llvm-ptx

the above example should look fairly simple to you, despite your lack of familiarity with llvm-hs

https://github.com/llvm-hs/llvm-hs-examples/blob/master/arith/Arith.hs#L266 <- this is me building the LLVM AST/module for my little expression type.

the rest is just plumbing.

so yeah if you want to take performance up a notch, I'd explore going down that route. you can still use/refer to external functions (BLAS and all), as long as you link to them, but this gives you a chance to optimize your expressions and emit optimal machine code for your computations, like a bunch of modern stuffs do

and imagine staging this whole process (i.e doing some TH to run the codegen and linking the machine code in with your haskell executable) so that none of that compilation happens at runtime... pretty ideal world here.

https://github.com/google-research/dex-lang#dex- <- the paper for this was very interesting btw, if you like thinking about ASTs/DSLs/compilation/numerics/AD/etc

Mikolaj avatar May 18 '22 07:05 Mikolaj