vaex icon indicating copy to clipboard operation
vaex copied to clipboard

feat(core): record all operations that led to dataframe

Open maartenbreddels opened this issue 5 years ago • 0 comments

This POC solves several issues:

  • ALL operations can be recorded in the df.operation members: opening, joining, groupby etc
  • a dataframe can be serialized better, since it knows completely how it was constructed.

The state is still useful, but how they work together is something I need to think about.

Example operations serialized to json:

{
  "type": "transformation",
  "name": "add_virtual_column",
  "parameters": {
    "name": "r",
    "expression": "(__r + y)",
    "column_position": 10
  },
  "child": {
    "type": "transformation",
    "name": "rename_column",
    "parameters": {
      "old": "r",
      "new": "__r"
    },
    "child": {
      "type": "transformation",
      "name": "add_virtual_column",
      "parameters": {
        "name": "r",
        "expression": "(x + y)",
        "column_position": 10
      },
      "child": {
        "type": "source",
        "name": "open",
        "parameters": {
          "path": "/Users/maartenbreddels/src/vaex/data/helmi-dezeeuw-2000-10p.hdf5"
        }
      }
    }
  }
}

Which reflects this code:

df = vaex.open(path, execute=False)
df['r'] = df.x + df.y
df['r'] = df.r + df.y

maartenbreddels avatar Jan 09 '20 16:01 maartenbreddels