clickhouse-java
clickhouse-java copied to clipboard
Spark JDBC cannot save MAP type
Describe the bug
From spark using the com.clickhouse.jdbc.driver, cannot save a dataframe that has a map type despite the table having a map and can perform this operation in other engines/code bases.
The following error gets raised:
Caused by: java.lang.IllegalArgumentException: Can't get JDBC type for map<string,string>
Steps to reproduce
Spin up clickhouse in a container. Run included python pyspark script.
Expected behaviour
Being able to save a Map type from spark using the jdbc driver
Code example
from pyspark.sql import SparkSession
from pyspark.sql.types import MapType, StringType
jars = ["com.clickhouse:clickhouse-jdbc:0.4.5",]
spark = SparkSession.builder.appName("map-test").config("spark.streaming.stopGracefullyOnShutdown", True).config("spark.jars.packages", ",".join(jars)).config("spark.sql.suffle.partitions", 4).master("local[*]").getOrCreate()
df = spark.createDataFrame([{"key": "key", "map": {"map": "test"}}])
df.write.format("jdbc").mode("append").option("driver", "com.clickhouse.jdbc.ClickHouseDriver").option("url", "jdbc:clickhouse://clickhouse-local:8123").option("dbtable", "test").option("batchsize", 1).option("isolationLevel", "NONE").save()
Error log
Configuration
Environment
- Client version:
- Language version:
- OS:
ClickHouse server
- ClickHouse Server version:
- ClickHouse Server non-default settings, if any:
CREATE TABLEstatements for tables involved:- Sample data for all these tables, use clickhouse-obfuscator if necessary
This is because there is no Spark-specific dialect implementation for Clickhouse, so Spark does not know how to convert this type to ClickhouseJDBC-compatible one: https://github.com/apache/spark/blob/b41ea9162f4c8fbc4d04d28d6ab5cc0342b88cb0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L139-L167