numexpr
numexpr copied to clipboard
Convert python functions into numexpr bytecode
For now numexpr is not pythonic, we have to deal with strings, it's messy. This can be fixed by transforming python functions into numexpr bytecode. Of course only a subset of all possible functions can be transformed. There are 2 ways: transform python bytecode or transform python AST. Since python bytecode is an impl detail which is to be changed, transforming AST is more compatible way, but it is limited to the cases where we can get the original python source. But there are packages decompiling python source from bytecode (I use https://github.com/rocky/python-uncompyle6 ). So since we have to deal with bytecode anyway, why not to deal witn it directly, using some libraries, dealing with differrences of bytecode format around versions?
The current, deployed branch (2.6) calls Python's compile
and then generates its own custom AST from that. The development branch (3.0) actually uses the Python ast
module and then parses the result with a functional approach which ends up being significantly faster overall. The simplest expressions now take just under 100 us to produce a NumExpr functions. There's also no translators in the standard Python library to transform ast.AST
objects to compiled Python bytecode, or vice-versa. NumExpr has a policy of having minimal dependencies. Right now it's numpy
, setuptools
, and that's it.
The NumExpr workflow generally revolves around:
- Prototype the function in NumPy.
- Drop the
np.
attributes, encase inne.evaluate('<...>')
, and now it's 4-8x faster for most cases.
So the main considerations with anything for NumExpr's development is: first, how fast is it to execute? And second, how much effort is required by the user to transform a NumPy prototype to a NumExpr function?
With the dev branch we've added almost all of the NumPy data types, so now we use NumPy 'safe'
casting rules, which makes NumExpr 3 much moreso directly translatable from NumPy Python code. We also can have multiple lines and named temporary arrays. Something like Numba has been moving further away from NumPy-like syntax. There's not a lot of point in Numba and NumExpr trying to do the same thing; they have their niche, we have ours. With @jit
functions you're no longer writing stuff in-line, and often to get Numba to work well you have to wrap a jitted function with a Python function, so it starts to feel like about the same amount of effort as writing a C-extension with the benefit of platform independence.
Possibly what would be the most 'Pythonic'-way of NumExpr interacting with NumPy code would be a context-manager, like the with
keyword:
foo = np.sqrt( bar*2.0 + car**2 )
to
as_ast @numexpr:
foo = np.sqrt( bar*2.0 + car**2 )
The idea that the as_ast
context manager tells the interpreter to not execute that piece of code but instead return the result of ast.parse()
.
I looked into it and it essentially requires a PEP. I've been lurking in the python-dev
mailing list and my impression is that any proposals that suggest bypassing the CPython interpreter would not be well received. At present I struggle to find the time to even push out the beta version of NumExpr 3.0, so making modifications to the Python language is not something I have time for.
The Meta package could solve this: it can decompile Python code
objects into ast.AST
. The only trouble is, I don't know how well it's maintained. If there's sufficient interest in using it for numexpr, maybe it could be adopted.
Forgot to add this: https://github.com/srossross/Meta
Thanks for the suggestion but as numexpr
is a requirement of modules like pandas
, pytables
, we basically have a hard rule against requiring any external modules outside of numpy
. We would be de-facto adding dependencies to our downstream modules, which are much bigger and more important to the Python community.
With NumExpr you basically can enclose your numpy-prototyped statements in triple-quotes and get them running in a multi-threaded virtual machine. That's the niche, and speed is often more important than Pythonic-ness. Tools like Theano and sympy
can do this sort of symbolic numerical evaluations, and there are drawbacks to it.
Message to comment on stale issues. If none provided, will not mark issues stale