materialize
materialize copied to clipboard
Improve representation of IR and metadata
Current design choices
- Use
Letto express sharing - Store metadata (types, keys) in
RelationExpr::Getrather than look up the correspondingLet - Recalculate metadata for other RelationExpr on the fly
- Mutate RelationExprs during transforms
- Do CSE as a final transform
Problems with the current design:
- Stored metadata gets out of sync (eg we mutate the value of a
Letin a way that adds/removes keys, but don't update the correspondingGet) - If we wanted to avoid storing metadata, when calling
Get{..}.typ()we'd need some way to find walk up the tree to the correspondingLet - Recalculating metadata is expensive and we have to code defensively to avoid performance problems
- Caching metadata is difficult because RelationExpr is often mutated
- Some metadata is only derived in the sql planner and can't be reconstructed (eg the scale/precision of decimals inside a ScalarExpr). If we want to eg pull out part of a Filter into a Map we can't figure out the correct ColumnType.
- Recent transforms to improve subqueries require inlining all
Lets, so we lose the benefit of expressing sharing explicitly but still bear all the costs.
Two proposed changes.
Scalar type inference:
- record precision/scale in every ScalarExpr that computes with decimals, so that we can write
ScalarExpr::typ(input_typ: RelationType) -> ScalarType - remove
ColumnTypefrom Map
Sharing:
- Get is only used for importing views
- rather than Let, we express sharing by pointing all instances at the same RelationExpr
- the Unbind and InlineLet transforms are no longer needed
- transforms use
maprather thanvisit_mut - we do some kind of clever interning so that two RelationExpr nodes with the same inputs should always point to the same allocation
- now that RelationExprs are immutable we can cheaply cache metadata by comparing pointer value
Concerns:
- is passing around the intern thing un-ergonomic?
- will transforms break sharing horribly?
Everyone was :+1:
@benesch is taking the first change, @jamii the second.