hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Prototype integration with an LLVM or similar tech.

Open skrawcz opened this issue 3 years ago • 5 comments

Here the following assumes "numba", but really we could replace "numba" with "jax", or any other framework that could optimize python code to execute faster.

Is your feature request related to a problem? Please describe. Numba is a way to accelerate python functions. To use it, you annotate your python functions to be "compiled" with the jit. It then creates faster code from it.

Currently the speed up only materializes on the second invocation of a function -- the first time it compiles it. So to work with Hamilton, we'd have to compile ahead of time (AOT) if people only run a DAG once. Otherwise we could use the jit for DAGs that people execute over and over again.

Describe the solution you'd like Two solutions:

  1. Prototype the ability to compile a hamilton graph ahead of time with Numba. You could use how we get Hamilton to run on Dask as a starting point (TODO: link to code). See these numba docs for ahead of time compilation.
  2. Prototype the ability to use the jit compiler with Numba. That way the first time someone runs execute things are compiled (no speed up), but the second time, things are lightning quick! See these docs.

Things to think about with prototype (1):

  1. Since compiling a head of time requires types -- we might need some better way to specify them? Or perhaps we can have numba infer it?
  2. The output of compilation is another set of python module(s) -- this is what we'd then want to use for computation.
  3. What is therefore the correct order of operations? Build the function graph, compile it, then somehow build the graph again with the new functions (?), and use that for execution?
  4. What are the limitations of this approach in terms of use cases, etc. We could limit to numpy and python primitive code only for instance.

Things to think about with prototype (2):

  1. What use cases does this make sense for?
  2. What are the limitations of this approach?

Describe alternatives you've considered Haven't.

Additional context

  • https://numba.readthedocs.io/en/stable/user/pycc.html#overview
  • https://numba.readthedocs.io/en/stable/reference/types.html#numba-types
  • https://numba.readthedocs.io/en/stable/user/jit.html

skrawcz avatar Feb 01 '22 00:02 skrawcz

Could also consider jax too.

skrawcz avatar Feb 11 '22 23:02 skrawcz

Started looking into numba -- seems like a pretty straightforward thing to test out. Hard to install on my M1, but I'll be giving it a spin with conda soon. In the case that every function is of numpy arrays (E.G. this, it looks like its pretty easy. We can write a driver that decorates everything with @jit, and the type inferrer should work. However, that has two limitations:

  1. It might not always work with type inference (a bit of a black box)
  2. It doesn't work with pandas DFs

So we'll need to test it out. Note that newer pandas actually allows for numba integration with some specific functions, its just not a global setting. https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html#pandas-numba-engine.

To do this right, we need actual type information which we don't have if we annotate things with pd.series.

elijahbenizzy avatar Feb 21 '22 05:02 elijahbenizzy

Jax seems to rely on modifying the code -- E.G. importing numpy as jax.numpy -- which basically decorates/changes types. So not the code-free change we'd want, but potentially useful.

elijahbenizzy avatar Feb 21 '22 05:02 elijahbenizzy

An alternative? https://github.com/spcl/dace Some relevant links - https://github.com/spcl/npbench

skrawcz avatar Mar 10 '22 19:03 skrawcz

Numexpr could be interesting. We could easily compile pandas code to pandas eval, with the following steps:

  1. Parse a function into AST elements
  2. Validate that they’re numexpr compatible
  3. Form an expression
  4. Pass to pandas eval

This could optimize a single function/node. We could then use a symbolic manipulator to combine across nodes. Idea from @drudd

elijahbenizzy avatar Mar 11 '22 01:03 elijahbenizzy

We are moving repositories! Please see the new version of this issue at https://github.com/DAGWorks-Inc/hamilton/issues/14. Also, please give us a star/update any of your internal links.

elijahbenizzy avatar Feb 26 '23 17:02 elijahbenizzy