numexpr icon indicating copy to clipboard operation
numexpr copied to clipboard

Convert python functions into numexpr bytecode

Open KOLANICH opened this issue 7 years ago • 4 comments

For now numexpr is not pythonic, we have to deal with strings, it's messy. This can be fixed by transforming python functions into numexpr bytecode. Of course only a subset of all possible functions can be transformed. There are 2 ways: transform python bytecode or transform python AST. Since python bytecode is an impl detail which is to be changed, transforming AST is more compatible way, but it is limited to the cases where we can get the original python source. But there are packages decompiling python source from bytecode (I use https://github.com/rocky/python-uncompyle6 ). So since we have to deal with bytecode anyway, why not to deal witn it directly, using some libraries, dealing with differrences of bytecode format around versions?

KOLANICH avatar Dec 30 '17 10:12 KOLANICH

The current, deployed branch (2.6) calls Python's compile and then generates its own custom AST from that. The development branch (3.0) actually uses the Python ast module and then parses the result with a functional approach which ends up being significantly faster overall. The simplest expressions now take just under 100 us to produce a NumExpr functions. There's also no translators in the standard Python library to transform ast.AST objects to compiled Python bytecode, or vice-versa. NumExpr has a policy of having minimal dependencies. Right now it's numpy, setuptools, and that's it.

The NumExpr workflow generally revolves around:

  1. Prototype the function in NumPy.
  2. Drop the np. attributes, encase in ne.evaluate('<...>'), and now it's 4-8x faster for most cases.

So the main considerations with anything for NumExpr's development is: first, how fast is it to execute? And second, how much effort is required by the user to transform a NumPy prototype to a NumExpr function?

With the dev branch we've added almost all of the NumPy data types, so now we use NumPy 'safe' casting rules, which makes NumExpr 3 much moreso directly translatable from NumPy Python code. We also can have multiple lines and named temporary arrays. Something like Numba has been moving further away from NumPy-like syntax. There's not a lot of point in Numba and NumExpr trying to do the same thing; they have their niche, we have ours. With @jit functions you're no longer writing stuff in-line, and often to get Numba to work well you have to wrap a jitted function with a Python function, so it starts to feel like about the same amount of effort as writing a C-extension with the benefit of platform independence.

Possibly what would be the most 'Pythonic'-way of NumExpr interacting with NumPy code would be a context-manager, like the with keyword:

foo = np.sqrt( bar*2.0 + car**2 )

to

as_ast @numexpr:
    foo = np.sqrt( bar*2.0 + car**2 )

The idea that the as_ast context manager tells the interpreter to not execute that piece of code but instead return the result of ast.parse().

I looked into it and it essentially requires a PEP. I've been lurking in the python-dev mailing list and my impression is that any proposals that suggest bypassing the CPython interpreter would not be well received. At present I struggle to find the time to even push out the beta version of NumExpr 3.0, so making modifications to the Python language is not something I have time for.

robbmcleod avatar Dec 30 '17 17:12 robbmcleod

The Meta package could solve this: it can decompile Python code objects into ast.AST. The only trouble is, I don't know how well it's maintained. If there's sufficient interest in using it for numexpr, maybe it could be adopted.

jpivarski avatar Jun 16 '18 11:06 jpivarski

Forgot to add this: https://github.com/srossross/Meta

jpivarski avatar Jun 16 '18 12:06 jpivarski

Thanks for the suggestion but as numexpr is a requirement of modules like pandas, pytables, we basically have a hard rule against requiring any external modules outside of numpy. We would be de-facto adding dependencies to our downstream modules, which are much bigger and more important to the Python community.

With NumExpr you basically can enclose your numpy-prototyped statements in triple-quotes and get them running in a multi-threaded virtual machine. That's the niche, and speed is often more important than Pythonic-ness. Tools like Theano and sympy can do this sort of symbolic numerical evaluations, and there are drawbacks to it.

robbmcleod avatar Jun 17 '18 01:06 robbmcleod

Message to comment on stale issues. If none provided, will not mark issues stale

github-actions[bot] avatar Feb 19 '24 01:02 github-actions[bot]