Create Hamilton converter for pandas code
Issue by skrawcz
Friday May 13, 2022 at 22:54 GMT
Originally opened as https://github.com/stitchfix/hamilton/issues/132
Is your feature request related to a problem? Please describe. With Hamilton you need need to restructure your code. This can be too much of a friction point for someone. Wouldn't it be nice if we had a way to help automate this step?
Describe the solution you'd like We should be able to write some python code that parses the AST to covert code like:
df['a'] = df['b'] + df['c']
into
def a(b: pd.Series, c: pd.Series) -> pd.Series:
return b + c
Core to this problem, is building code to parse python code and output/print hamilton functions. Once we have that, we can think about the places we could provide this, e.g. CLI, a website, some other means...
Describe alternatives you've considered Not doing this.
Additional context It would enable people to get up and running with Hamilton faster. E.g. if they provided a script, and we "walked" the script and guessed what should be output...
Comment by elijahbenizzy
Saturday Jun 04, 2022 at 21:34 GMT
Interesting... Another approach is to use some version of pandas that traces the actions and delays it (I bet koalas does something like this, or ibis), then use the intermediate data representation to compile to hamilton.
Comment by elijahbenizzy
Saturday Oct 29, 2022 at 17:07 GMT
Started working on this -- got a rough POC working but its fairly complicated and wasn't worth my time.
What I'd love is doing this on the "intro to hamilton" web-page -- live conversion with an example we can get to. Will take another stab with my old prototype.
OK, revisiting this, gpt-4 could be really good here... gpt-3.5 was able to do this decently...
https://github.com/DAGWorks-Inc/hamilton/pull/583 gets this started