optd icon indicating copy to clipboard operation
optd copied to clipboard

Tracking: parity with Postgres for TPC-H cardinality estimations

Open wangpatrick57 opened this issue 11 months ago • 0 comments

Notes

  • Sometimes Postgres does really bad (even worse than our magic numbers!). However, the goal right now is simply to match Postgres, not to match the truecard. When I say fix, I mean match Postgres.
    • This is because we know exactly what we need to do to match Postgres but we don't know what we need to do to match the truecard.
  • Experiments ran with scale factor 1.0, seed 15721
  • If no other PR is mentioned, then the query was run based on #132

Queries

  • [ ] Q1
    • Not running. See #68
  • [x] Q2 (already matching in #132)
  • [x] Q3: truecard=10, pgcard=10, dfcard=10
    • Fixed by #138
  • [ ] Q4
    • Not running. See #68
  • [x] Q5: truecard=5, pgcard=25, dfcard=25
    • Fixed by #144
  • [x] Q6: truecard=1, pgcard=1, dfcard=1
    • #143 revealed the problem here
    • Fixed by #144
  • [ ] Q7: truecard=4, pgcard=6119, dfcard=125000
    • Fixing join predicates and fixing multi-dim group by would definitely help with this, but it's not clear whether it would completely fix it.
    • #145 changed dfcard from 1 to 125000
  • [ ] Q8: truecard=2, pgcard=2406, dfcard=200
    • Fixing single-dim group by and pulling expressions up to the group by should fix this. Postgres identifies that the group by is done on EXTRACT(year FROM orders.o_orderdate) and it simply uses the N-Distinct of orders.o_orderdate as the cardinality of the query.
  • [ ] Q9: truecard=175, pgcard=60150, dfcard=5000
    • Fixing single-dim group by and pulling expressions up to the group by should fix this. When you get rid of p_name like '%forest' and just use o_orderdate as o_year, you get exactly 60150 rows.
    • #145 changed dfcard from 25 to 5000
  • [x] Q10: truecard=20, pgcard=20, dfcard=20
    • Fixed by #138 and #143
  • [ ] Q11: truecard=869, pgcard=10667, dfcard=67936
    • #145 changed dfcard from 1 to 67936
    • I'm not sure how Postgres gets to 10667.
  • [x] Q12: truecard=2, pgcard=7, dfcard=7
    • Fixed by #143 and #144
  • [x] Q13: truecard=42, pgcard=200, dfcard=200
    • Fixed by #145
  • [x] Q14: truecard=1, pgcard=1, dfcard=1
    • Making aggregates give rows=1 should fix this. It's just an aggregate.
    • Fixed by #144
  • [ ] Q15
    • Not running. See #68
  • [ ] Q16
    • Not running. See #68
  • [x] Q17: truecard=1, pgcard=1, dfcard=1
    • Making aggregates give rows=1 should fix this. It's just an aggregate.
    • Fixed by #144
  • [ ] Q18
    • Not running. See #68
  • [x] Q19: truecard=1, pgcard=1, dfcard=1
  • [ ] Q20
    • Not running. See #68
  • [ ] Q21
    • Not running. See #68
  • [ ] Q22
    • Not running. See #68

wangpatrick57 avatar Mar 22 '24 13:03 wangpatrick57