clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

Error when writing array<string> column of dataframe

Open honavar-sohan opened this issue 7 years ago • 4 comments

Hi

I am reading data from mongo using spark-mongo-connector as data-frame one of the column is array<string> and writing data-frame to clickhouse but I get the following error

java.lang.IllegalArgumentException: Can't get JDBC type for array<string>
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:175)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:175)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType(JdbcUtils.scala:174)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$20.apply(JdbcUtils.scala:635)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$20.apply(JdbcUtils.scala:635)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:635)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:821)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:821)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

the data-drafe i am writing is as follows

+---------+-------------+----------+----------+----------------+
|member_id|mobile_number|   updated|   created|array_string|
+---------+-------------+----------+----------+----------------+
|        1|  1234567890|1970-01-01|1970-01-01|   [hello,world]|
+---------+-------------+----------+----------+----------------+

following is clickhouse table schema

CREATE TABLE test_data (
 member_id String,
 mobile_number Nullable(String),
 updated Nullable(String),
 created Date,
 array_string Array(String)
)ENGINE=MergeTree(created,member_id, 8192)

if i try making column array_string as string like this "hello,world" and insert i get following error

ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 27, host: 127.0.0.1, port: 8123; Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected [ before: hello,world\n: (at row 1)

Row 1:
Column 0,   name: member_id,        type: String,           parsed text: "1"
Column 1,   name: mobile_number,    type: Nullable(String), parsed text: "1234567890"
Column 2,   name: updated,          type: Nullable(String), parsed text: "1970-01-01"
Column 3,   name: created,          type: Date,             parsed text: "1970-01-01"
Column 4,   name: array_string, type: Array(String),    parsed text: <EMPTY>ERROR

, e.what() = DB::Exception

	at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:58)
	at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:28)
	at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:723)
	at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:699)
	at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:682)
	at ru.yandex.clickhouse.ClickHousePreparedStatementImpl.executeBatch(ClickHousePreparedStatementImpl.java:382)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:659)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:821)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:821)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Throwable: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected [ before: hello,world\n: (at row 1)

As per my investigation i think clickhouse jdbc dosent accept array<string> in data-frame

i am not sure how do fix this, if anyone of you can help me figure this issue or provide me a solution that would be helpful.

Thanks

honavar-sohan avatar Aug 24 '18 06:08 honavar-sohan

I am also facing the same issue

arnaboss avatar Aug 24 '18 10:08 arnaboss

I have a similar issue, but with array type.

pyspark.sql.utils.IllegalArgumentException: "Can't get JDBC type for array < bigint >"

If you look closely, the stack trace shows a Spark error, not a JDBC error

seufagner avatar Oct 15 '18 18:10 seufagner

@honavar-sohan how are you setting the parameter values? As far as I can see, the JDBC driver works fine, see examples in PR #302

enqueue avatar Feb 12 '19 21:02 enqueue

The same problem also affects columns of MapType:

IllegalArgumentException: Can't get JDBC type for map<string,string>

I am not quite sure though whether that's a problem in Spark or in Clickhouse...

harpaj avatar Sep 30 '21 15:09 harpaj

The same problem also affects columns of MapType:

IllegalArgumentException: Can't get JDBC type for map<string,string>

I am not quite sure though whether that's a problem in Spark or in Clickhouse...

any solution ?

ciazhar avatar Mar 28 '23 09:03 ciazhar

The JDBC driver supports Map, but unfortunately we don't have a dialect implemented for Spark.

zhicwu avatar Mar 29 '23 05:03 zhicwu

Is there any update on this? I am still in need of Map and Array types :)

AkhtemWays avatar Aug 24 '23 20:08 AkhtemWays