weaverbird icon indicating copy to clipboard operation
weaverbird copied to clipboard

Rewrite formula implementation for python/pandas backend

Open davinov opened this issue 2 years ago • 0 comments

The actual implementation is based on pandas' eval, which is quite permissive (close to any valid python) but doesn't work with some modern dtypes, such as https://pandas.pydata.org/docs/user_guide/integer_na.html

My suggestion would be to use the same parser for formulas for all backends, so we get alsways the the same ast, and the same set of nodes and operations to support. Then each backend could implement the transformation of this ast to its engine. For pandas, that would mean translating formula a + b to df[a] + df[b] instead of using df.eval(a + b).

Advantages:

  • same syntax for all backend formulas
  • clear set of formula operator supported across backend
  • eval is quite limited in pandas
  • eval is not safe

davinov avatar Apr 14 '22 12:04 davinov