clickhouse-java
clickhouse-java copied to clipboard
JDBC throws `java.lang.IllegalArgumentException: Unknown data type: string` when write array string with Apache Spark scala
Describe the bug
When using spark to write an array of string to Clickhouse, the driver throws java.lang.IllegalArgumentException: Unknown data type: string
exception.
Reasons:
-
Exception is thrown by: https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238, due to
String
type is case-sensitive type. -
This was caused by Spark JDBC Utils tried to cast the type to lower case (
String
->string
). https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639
Steps to reproduce
- Create table with String Array field
- Write data to the table with scala Spark
// code extraction, will need to setup a Scala Spark job with clickhouse jdbc
val clickHouseSchema = StructType(
Seq(
StructField("str_array", ArrayType(StringType))
)
)
val data = Seq(
Row(
Seq("a", "b")
)
)
val clickHouseDf = spark.createDataFrame(sc.parallelize(data), clickHouseSchema)
val props = new Properties
props.put("user", "default")
clickHouseDf.write
.mode(SaveMode.Append)
.option("driver", com.clickhouse.jdbc.ClickHouseDriver)
.jdbc("jdbc:clickhouse://localhost:8123/foo", table = "bar", props)
Expected behaviour
Error log
java.lang.IllegalArgumentException: Unknown data type: string
Configuration
Environment
- Client version: JDBC lib build from
main
, (this commit)[https://github.com/ClickHouse/clickhouse-java/commit/aa3870eadb1a2d3675fd5119714c85851800f076] - Language version: SDK corretto-1.8
- OS: Reproducible on linux / macos
ClickHouse server
- ClickHouse Server version: https://hub.docker.com/layers/clickhouse/clickhouse-server/23.11.2.11/
- ClickHouse Server non-default settings, if any:
-
CREATE TABLE
statements for tables involved:
CREATE DATABASE IF NOT EXISTS foo;
CREATE TABLE IF NOT EXISTS foo.bar
(
str_array Array(Nullable(String))
);
@phanhuyn Sorry for the delay. The issue depends on Apache Spark PR. Do you know what is the status of the PR?
Hi @mzitnik, sorry for the delay reply. The status of the Spark PR is still pending review.
If Spark PR is approved & merged, this clickhouse's PR is NOT required.
On Spark side, the cast to lower string on the type might be intentional and the Spark PR could be rejected.