spark-sql-perf icon indicating copy to clipboard operation
spark-sql-perf copied to clipboard

Use CHAR/VARCHAR types in TPCDSTables

Open maropu opened this issue 4 years ago • 2 comments

TPC-DS schemas are different between spark-sql-perf TPCDSTables and spark-master/branch-3.1 TPCDSBase (string v.s. char/varchar). For example;

// spark
    "reason" ->
      """
        |`r_reason_sk` INT,
        |`r_reason_id` CHAR(16),
        |`r_reason_desc` CHAR(100)
      """.stripMargin,

// spark-sql-perf
    Table("reason",
      partitionColumns = Nil,
      'r_reason_sk               .int,
      'r_reason_id               .string,
      'r_reason_desc             .string),

To generated TPCDS table data for Spark (master/branch-3.1), it would be nice to use CHAR/VARCHAR types in TPCDSTables.

NOTE: This ticket comes from https://github.com/apache/spark/pull/31886

maropu avatar Mar 30 '21 01:03 maropu

https://github.com/databricks/spark-sql-perf/pull/201

maropu avatar Apr 26 '21 04:04 maropu

Is there a specific reason that this schema was created in the first place rather then using the schema mentioned in the tpc org documentation? http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.1.0.pdf

zhaner08 avatar Jul 07 '21 00:07 zhaner08