Need to insert an Array of Integers into ClickHouse Table Column from Spark :: Can't get JDBC type for array<int>
OS
Windows 11 Spark 3.1.2 Python 3.7
What I am doing ?
I have a dataframe that contains an Array of Integers in a column. I am inserting this dataframe into ClickHouse using jar clickhouse-jdbc-0.3.2-patch11.jar and driver com.github.housepower.jdbc.ClickHouseDriver.
Error I am getting
Can't get JDBC type for array<int>
Code Snippet
source_df.write \
.format("jdbc") \
.mode("overwrite") \
.option("driver", "com.github.housepower.jdbc.ClickHouseDriver") \
.option("url", os.environ["clickhous_url"]) \
.option("createTableOptions", "engine=MergeTree()"
"primary key (UserID)"
"order by (UserID)") \
.option("user", "clickhouse_user") \
.option("password", "xxxxxxxxx") \
.option("dbtable", "User_Table") \
.option("batchsize", "20000") \
.option("truncate", "true") \
.save()
I have tried with com.github.housepower:clickhouse-native-jdbc but It is also giving the same issue.
Can anyone help me on this ?
you can add this JdbcDialects.registerDialect(ClickHouseDialect) before write to clickhouse.. import org.apache.spark.sql.jdbc.{ClickHouseDialect, JdbcDialects} ClickHouseDialect comes from <groupId>com.github.housepower</groupId> <artifactId>clickhouse-integration-spark_${scala.binary.version}</artifactId>
I tried adding this to my pyspark code, however I couldn't get it right. Can you please help me on this ?
from py4j.java_gateway import java_import
gw = self.spark.sparkContext._gateway
java_import(gw.jvm, "com.github.housepower.clickhouse-integration-spark_2.12.ClickHouseDialect")
gw.jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(ClickHouseDialect)
any update on this
I'm also faced this problem, is there any way to get around the problem?
Now we we proceed as follows:
- Create tmp table in CH with string type column that must be array type
- Write data to tmp table with string values (integers, concatenated with comma separator)
- Run INSERT SQL query like this
INSERT INTO table
SELECT
col1,
col2,
...,
IF(
string_column != '', -- if string_column value is not empty string
arrayMap(
x -> toUInt32(x),
splitByChar(
',', -- split by comma
string_column
)
),
[]
) AS array_column
FROM table_tmp
WHERE cond
Is there any update on this? I'm still experiencing issues while trying to import data into ClickHouse (v22.2.2.1) using Horsepower. Here's the guide I'm following: https://housepower.github.io/ClickHouse-Native-JDBC/guide/spark_integration.html#integration-with-spark