if lhs of one stage is the exact same data frame as rhs of the prior step, don't output both to json

Open pgbovine opened this issue 4 years ago • 2 comments

Right now in my trace canonicalization script, I remove this redundancy when possible, but ideally it would be removed earlier in the tracer itself. That's because my hunch is that you can do a == (pointer equals) operation to compare rhs of one side to lhs of the other side, since they're the same data frame. If they're indeed the same data frame, then there's no point in writing it out twice to JSON, which doubles the space usage (and possibly memory usage too because JSON encoding can be memory-intensive).

Nov 06 '21 21:11 pgbovine

the way i'd encode it is something like

  lhs: "prev_rhs"

Nov 19 '21 22:11 pgbovine

      "tables": {
        "lhs": "prev_rhs",
        "rhs": {
          "col_names": [
            "carb",
            "optden"
          ],

Nov 19 '21 22:11 pgbovine