optd
optd copied to clipboard
Tracking: parity with Postgres for TPC-H cardinality estimations
Notes
- Sometimes Postgres does really bad (even worse than our magic numbers!). However, the goal right now is simply to match Postgres, not to match the truecard. When I say fix, I mean match Postgres.
- This is because we know exactly what we need to do to match Postgres but we don't know what we need to do to match the truecard.
- Experiments ran with scale factor 1.0, seed 15721
- If no other PR is mentioned, then the query was run based on #132
Queries
- [ ] Q1
- Not running. See #68
- [x] Q2 (already matching in #132)
- [x] Q3: truecard=10, pgcard=10, dfcard=10
- Fixed by #138
- [ ] Q4
- Not running. See #68
- [x] Q5: truecard=5, pgcard=25, dfcard=25
- Fixed by #144
- [x] Q6: truecard=1, pgcard=1, dfcard=1
- #143 revealed the problem here
- Fixed by #144
- [ ] Q7: truecard=4, pgcard=6119, dfcard=125000
- Fixing join predicates and fixing multi-dim group by would definitely help with this, but it's not clear whether it would completely fix it.
- #145 changed dfcard from 1 to 125000
- [ ] Q8: truecard=2, pgcard=2406, dfcard=200
-
Fixing single-dim group by and pulling expressions up to the group by should fix this. Postgres identifies that the group by is done on
EXTRACT(year FROM orders.o_orderdate)
and it simply uses the N-Distinct of orders.o_orderdate as the cardinality of the query.
-
Fixing single-dim group by and pulling expressions up to the group by should fix this. Postgres identifies that the group by is done on
- [ ] Q9: truecard=175, pgcard=60150, dfcard=5000
-
Fixing single-dim group by and pulling expressions up to the group by should fix this. When you get rid of
p_name like '%forest'
and just useo_orderdate as o_year
, you get exactly 60150 rows. - #145 changed dfcard from 25 to 5000
-
Fixing single-dim group by and pulling expressions up to the group by should fix this. When you get rid of
- [x] Q10: truecard=20, pgcard=20, dfcard=20
- Fixed by #138 and #143
- [ ] Q11: truecard=869, pgcard=10667, dfcard=67936
- #145 changed dfcard from 1 to 67936
- I'm not sure how Postgres gets to 10667.
- [x] Q12: truecard=2, pgcard=7, dfcard=7
- Fixed by #143 and #144
- [x] Q13: truecard=42, pgcard=200, dfcard=200
- Fixed by #145
- [x] Q14: truecard=1, pgcard=1, dfcard=1
- Making aggregates give rows=1 should fix this. It's just an aggregate.
- Fixed by #144
- [ ] Q15
- Not running. See #68
- [ ] Q16
- Not running. See #68
- [x] Q17: truecard=1, pgcard=1, dfcard=1
- Making aggregates give rows=1 should fix this. It's just an aggregate.
- Fixed by #144
- [ ] Q18
- Not running. See #68
- [x] Q19: truecard=1, pgcard=1, dfcard=1
- [ ] Q20
- Not running. See #68
- [ ] Q21
- Not running. See #68
- [ ] Q22
- Not running. See #68