clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

JDBC throws `java.lang.IllegalArgumentException: Unknown data type: string` when write array string with Apache Spark scala

Open phanhuyn opened this issue 1 year ago • 2 comments

Describe the bug

When using spark to write an array of string to Clickhouse, the driver throws java.lang.IllegalArgumentException: Unknown data type: string exception.

Reasons:

  • Exception is thrown by: https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238, due to String type is case-sensitive type.

  • This was caused by Spark JDBC Utils tried to cast the type to lower case (String -> string). https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639

Steps to reproduce

  1. Create table with String Array field
  2. Write data to the table with scala Spark
   // code extraction, will need to setup a Scala Spark job with clickhouse jdbc
    val clickHouseSchema = StructType(
      Seq(
        StructField("str_array", ArrayType(StringType))
      )
    )
    val data = Seq(
      Row(
        Seq("a", "b")
      )
    )

    val clickHouseDf = spark.createDataFrame(sc.parallelize(data), clickHouseSchema)
   
    val props = new Properties
    props.put("user", "default")
    clickHouseDf.write
      .mode(SaveMode.Append)
      .option("driver", com.clickhouse.jdbc.ClickHouseDriver)
      .jdbc("jdbc:clickhouse://localhost:8123/foo", table = "bar", props)

Expected behaviour

Error log

java.lang.IllegalArgumentException: Unknown data type: string

Configuration

Environment

  • Client version: JDBC lib build from main, (this commit)[https://github.com/ClickHouse/clickhouse-java/commit/aa3870eadb1a2d3675fd5119714c85851800f076]
  • Language version: SDK corretto-1.8
  • OS: Reproducible on linux / macos

ClickHouse server

  • ClickHouse Server version: https://hub.docker.com/layers/clickhouse/clickhouse-server/23.11.2.11/
  • ClickHouse Server non-default settings, if any:
  • CREATE TABLE statements for tables involved:
CREATE DATABASE IF NOT EXISTS foo;
CREATE TABLE IF NOT EXISTS foo.bar
(
    str_array  Array(Nullable(String))
);

phanhuyn avatar Dec 22 '23 03:12 phanhuyn

@phanhuyn Sorry for the delay. The issue depends on Apache Spark PR. Do you know what is the status of the PR?

mzitnik avatar Jan 01 '24 08:01 mzitnik

Hi @mzitnik, sorry for the delay reply. The status of the Spark PR is still pending review.

If Spark PR is approved & merged, this clickhouse's PR is NOT required.

On Spark side, the cast to lower string on the type might be intentional and the Spark PR could be rejected.

phanhuyn avatar Jan 07 '24 23:01 phanhuyn