mario
mario copied to clipboard
if lhs of one stage is the exact same data frame as rhs of the prior step, don't output both to json
Right now in my trace canonicalization script, I remove this redundancy when possible, but ideally it would be removed earlier in the tracer itself. That's because my hunch is that you can do a == (pointer equals) operation to compare rhs of one side to lhs of the other side, since they're the same data frame. If they're indeed the same data frame, then there's no point in writing it out twice to JSON, which doubles the space usage (and possibly memory usage too because JSON encoding can be memory-intensive).
the way i'd encode it is something like
lhs: "prev_rhs"
"tables": {
"lhs": "prev_rhs",
"rhs": {
"col_names": [
"carb",
"optden"
],