spark-sql-perf
spark-sql-perf copied to clipboard
Use CHAR/VARCHAR types in TPCDSTables
TPC-DS schemas are different between spark-sql-perf TPCDSTables and spark-master/branch-3.1 TPCDSBase (string v.s. char/varchar). For example;
// spark
"reason" ->
"""
|`r_reason_sk` INT,
|`r_reason_id` CHAR(16),
|`r_reason_desc` CHAR(100)
""".stripMargin,
// spark-sql-perf
Table("reason",
partitionColumns = Nil,
'r_reason_sk .int,
'r_reason_id .string,
'r_reason_desc .string),
To generated TPCDS table data for Spark (master/branch-3.1), it would be nice to use CHAR/VARCHAR types in TPCDSTables.
NOTE: This ticket comes from https://github.com/apache/spark/pull/31886
https://github.com/databricks/spark-sql-perf/pull/201
Is there a specific reason that this schema was created in the first place rather then using the schema mentioned in the tpc org documentation? http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.1.0.pdf