vaex
vaex copied to clipboard
feat(core): record all operations that led to dataframe
This POC solves several issues:
- ALL operations can be recorded in the df.operation members: opening, joining, groupby etc
- a dataframe can be serialized better, since it knows completely how it was constructed.
The state is still useful, but how they work together is something I need to think about.
Example operations serialized to json:
{
"type": "transformation",
"name": "add_virtual_column",
"parameters": {
"name": "r",
"expression": "(__r + y)",
"column_position": 10
},
"child": {
"type": "transformation",
"name": "rename_column",
"parameters": {
"old": "r",
"new": "__r"
},
"child": {
"type": "transformation",
"name": "add_virtual_column",
"parameters": {
"name": "r",
"expression": "(x + y)",
"column_position": 10
},
"child": {
"type": "source",
"name": "open",
"parameters": {
"path": "/Users/maartenbreddels/src/vaex/data/helmi-dezeeuw-2000-10p.hdf5"
}
}
}
}
}
Which reflects this code:
df = vaex.open(path, execute=False)
df['r'] = df.x + df.y
df['r'] = df.r + df.y