spark-sql-perf
spark-sql-perf copied to clipboard
Is Hive a prerequisite?
Throughout the code I get the feeling that a pre-installed Hive installation is needed. Is this correct? Because when writing to (external table) I can see the spark driver assumes the destination is a Hive database.
If it is indeed needed, that should be important to add in the README.md.
@hansbogert To use spark-sql-perf, you only need Spark and TPC-DS's tool-kit. A pre-installed Hive is not needed. But, you probably want to build spark with -Phive profile to add Hive as a dependency. Then you can use HiveContext that has a parser with better SQL coverage and metastore support. For the method of createExternalTable, it uses Hive metastore to persist metadata (you can just use the built-in derby metastore).