weaverbird
weaverbird copied to clipboard
Rewrite formula implementation for python/pandas backend
The actual implementation is based on pandas' eval, which is quite permissive (close to any valid python) but doesn't work with some modern dtypes, such as https://pandas.pydata.org/docs/user_guide/integer_na.html
My suggestion would be to use the same parser for formulas for all backends, so we get alsways the the same ast, and the same set of nodes and operations to support. Then each backend could implement the transformation of this ast to its engine.
For pandas, that would mean translating formula a + b
to df[a] + df[b]
instead of using df.eval(a + b)
.
Advantages:
- same syntax for all backend formulas
- clear set of formula operator supported across backend
- eval is quite limited in pandas
- eval is not safe