materialize
materialize copied to clipboard
Improve representation of IR and metadata
Current design choices
- Use
Let
to express sharing - Store metadata (types, keys) in
RelationExpr::Get
rather than look up the correspondingLet
- Recalculate metadata for other RelationExpr on the fly
- Mutate RelationExprs during transforms
- Do CSE as a final transform
Problems with the current design:
- Stored metadata gets out of sync (eg we mutate the value of a
Let
in a way that adds/removes keys, but don't update the correspondingGet
) - If we wanted to avoid storing metadata, when calling
Get{..}.typ()
we'd need some way to find walk up the tree to the correspondingLet
- Recalculating metadata is expensive and we have to code defensively to avoid performance problems
- Caching metadata is difficult because RelationExpr is often mutated
- Some metadata is only derived in the sql planner and can't be reconstructed (eg the scale/precision of decimals inside a ScalarExpr). If we want to eg pull out part of a Filter into a Map we can't figure out the correct ColumnType.
- Recent transforms to improve subqueries require inlining all
Let
s, so we lose the benefit of expressing sharing explicitly but still bear all the costs.
Two proposed changes.
Scalar type inference:
- record precision/scale in every ScalarExpr that computes with decimals, so that we can write
ScalarExpr::typ(input_typ: RelationType) -> ScalarType
- remove
ColumnType
from Map
Sharing:
- Get is only used for importing views
- rather than Let, we express sharing by pointing all instances at the same RelationExpr
- the Unbind and InlineLet transforms are no longer needed
- transforms use
map
rather thanvisit_mut
- we do some kind of clever interning so that two RelationExpr nodes with the same inputs should always point to the same allocation
- now that RelationExprs are immutable we can cheaply cache metadata by comparing pointer value
Concerns:
- is passing around the intern thing un-ergonomic?
- will transforms break sharing horribly?
Everyone was :+1:
@benesch is taking the first change, @jamii the second.