clickhouse-java
clickhouse-java copied to clipboard
Error when writing array<string> column of dataframe
Hi
I am reading data from mongo using spark-mongo-connector as data-frame one of the column is array<string> and writing data-frame to clickhouse but I get the following error
java.lang.IllegalArgumentException: Can't get JDBC type for array<string>
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:175)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:175)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType(JdbcUtils.scala:174)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$20.apply(JdbcUtils.scala:635)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$20.apply(JdbcUtils.scala:635)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:635)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:821)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:821)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
the data-drafe i am writing is as follows
+---------+-------------+----------+----------+----------------+
|member_id|mobile_number| updated| created|array_string|
+---------+-------------+----------+----------+----------------+
| 1| 1234567890|1970-01-01|1970-01-01| [hello,world]|
+---------+-------------+----------+----------+----------------+
following is clickhouse table schema
CREATE TABLE test_data (
member_id String,
mobile_number Nullable(String),
updated Nullable(String),
created Date,
array_string Array(String)
)ENGINE=MergeTree(created,member_id, 8192)
if i try making column array_string as string like this "hello,world" and insert i get following error
ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 27, host: 127.0.0.1, port: 8123; Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected [ before: hello,world\n: (at row 1)
Row 1:
Column 0, name: member_id, type: String, parsed text: "1"
Column 1, name: mobile_number, type: Nullable(String), parsed text: "1234567890"
Column 2, name: updated, type: Nullable(String), parsed text: "1970-01-01"
Column 3, name: created, type: Date, parsed text: "1970-01-01"
Column 4, name: array_string, type: Array(String), parsed text: <EMPTY>ERROR
, e.what() = DB::Exception
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:58)
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:28)
at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:723)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:699)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:682)
at ru.yandex.clickhouse.ClickHousePreparedStatementImpl.executeBatch(ClickHousePreparedStatementImpl.java:382)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:659)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:821)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:821)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Throwable: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected [ before: hello,world\n: (at row 1)
As per my investigation i think clickhouse jdbc dosent accept array<string> in data-frame
i am not sure how do fix this, if anyone of you can help me figure this issue or provide me a solution that would be helpful.
Thanks
I am also facing the same issue
I have a similar issue, but with array
pyspark.sql.utils.IllegalArgumentException: "Can't get JDBC type for array < bigint >"
If you look closely, the stack trace shows a Spark error, not a JDBC error
@honavar-sohan how are you setting the parameter values? As far as I can see, the JDBC driver works fine, see examples in PR #302
The same problem also affects columns of MapType:
IllegalArgumentException: Can't get JDBC type for map<string,string>
I am not quite sure though whether that's a problem in Spark or in Clickhouse...
The same problem also affects columns of MapType:
IllegalArgumentException: Can't get JDBC type for map<string,string>I am not quite sure though whether that's a problem in Spark or in Clickhouse...
any solution ?
The JDBC driver supports Map, but unfortunately we don't have a dialect implemented for Spark.
Is there any update on this? I am still in need of Map and Array types :)