Prototype arkouda.for_each method.

Open wlav opened this issue 1 year ago • 1 comments

This PR is a WIP, added on request; it's an initial prototype showing how an arkouda.for_each method could be implemented. A true production version should take LLVM IR on the client side and send that instead of pickling the functor, then JIT it using ORCJit on the server. (Note that LLVM IR is not as stable as Python's pickle and thus there may be deviations between LLVM used in Chapel and LLVM used in Numba.) Doing so would remove the dependency on Python for the server.

There is a client-side dependency added on cloudpickle. That module allows for complete packaging of what is pickled as opposed to the standard pickle which only adds references to modules that are subsequently imported when unpickling. This is necessary since it can not be expected that client-side Python modules are available server-side. Going to LLVM IR instead would remove the need of this dependency.

PythonMsg needs to be uncommented in ServerModules.cfg to enable it, and building/linking with Python requires ARKOUDA_PYTHON_SERVER=1 when running make.

Some rough numbers as to why we're doing this; single node, all local, 10^10 float64s, a 1D p3:

numpy, externalized loop:      1hr 50min (est.)
arkouda, externalized loop:    178days (est.)
numpy.poly1d:                  92s
C++ vector -O3:                8s
arkouda.for_each:              3s

I'm somewhat new to Chapel; I'm presuming that it outperforms C++ b/c the code is either vectorized or threaded. numpy.poly1d is that much slower because of the creation of temporary arrays and subsequent memory copies. The arkouda.for_each timing includes running the JIT (no warmup; Numba itself also adds a one-off overhead of ~0.3s here).

Jul 18 '24 19:07 wlav

Just a note, to compile this PR you need to set export ARKOUDA_PYTHON_SERVER=True.

Sep 27 '24 16:09 ajpotts