materialize icon indicating copy to clipboard operation
materialize copied to clipboard

Improve representation of IR and metadata

Open jamii opened this issue 5 years ago • 1 comments

Current design choices

  • Use Let to express sharing
  • Store metadata (types, keys) in RelationExpr::Get rather than look up the corresponding Let
  • Recalculate metadata for other RelationExpr on the fly
  • Mutate RelationExprs during transforms
  • Do CSE as a final transform

Problems with the current design:

  • Stored metadata gets out of sync (eg we mutate the value of a Let in a way that adds/removes keys, but don't update the corresponding Get)
  • If we wanted to avoid storing metadata, when calling Get{..}.typ() we'd need some way to find walk up the tree to the corresponding Let
  • Recalculating metadata is expensive and we have to code defensively to avoid performance problems
  • Caching metadata is difficult because RelationExpr is often mutated
  • Some metadata is only derived in the sql planner and can't be reconstructed (eg the scale/precision of decimals inside a ScalarExpr). If we want to eg pull out part of a Filter into a Map we can't figure out the correct ColumnType.
  • Recent transforms to improve subqueries require inlining all Lets, so we lose the benefit of expressing sharing explicitly but still bear all the costs.

Two proposed changes.

Scalar type inference:

  • record precision/scale in every ScalarExpr that computes with decimals, so that we can write ScalarExpr::typ(input_typ: RelationType) -> ScalarType
  • remove ColumnType from Map

Sharing:

  • Get is only used for importing views
  • rather than Let, we express sharing by pointing all instances at the same RelationExpr
  • the Unbind and InlineLet transforms are no longer needed
  • transforms use map rather than visit_mut
  • we do some kind of clever interning so that two RelationExpr nodes with the same inputs should always point to the same allocation
  • now that RelationExprs are immutable we can cheaply cache metadata by comparing pointer value

Concerns:

  • is passing around the intern thing un-ergonomic?
  • will transforms break sharing horribly?

jamii avatar Oct 08 '19 18:10 jamii

Everyone was :+1:

@benesch is taking the first change, @jamii the second.

jamii avatar Oct 08 '19 19:10 jamii