deepdb-public icon indicating copy to clipboard operation
deepdb-public copied to clipboard

Cardinality Estimation Anomalies for ORDERKEY and PARTKEY in TPC-H Dataset (SF=1)

Open Koki05410 opened this issue 1 year ago • 0 comments

I've been working with the TPC-H dataset (Scale Factor 1) in DeepDB and noticed an unusual pattern in cardinality estimation (CE). Specifically, when querying numerical columns with limited distinct values such as ORDERKEY and PARTKEY in the LINEITEM (total records = 6001215) table, the system's predictions come out as multiples of the inverse of the sampling rate or exactly one. (e.g. CE results were 1, 6, 12, 18, ... when samples_per_spn = 1000000 1000000 1000000 1000000 1000000) This occurs even after listing these columns under the no_compression section of the schema file to avoid compression effects. I'd appreciate any guidance or recommendations to mitigate this issue.

Koki05410 avatar Jan 31 '24 00:01 Koki05410