clickhouse-java Need to insert an Array of Integers into ClickHouse Table Column from Spark

OS

Windows 11 Spark 3.1.2 Python 3.7

What I am doing ?

I have a dataframe that contains an Array of Integers in a column. I am inserting this dataframe into ClickHouse using jar clickhouse-jdbc-0.3.2-patch11.jar and driver com.github.housepower.jdbc.ClickHouseDriver.

Error I am getting

Can't get JDBC type for array<int>

Code Snippet

source_df.write \
            .format("jdbc") \
            .mode("overwrite") \
            .option("driver", "com.github.housepower.jdbc.ClickHouseDriver") \
            .option("url", os.environ["clickhous_url"]) \
            .option("createTableOptions", "engine=MergeTree()"
                                          "primary key (UserID)"
                                          "order by (UserID)") \
            .option("user", "clickhouse_user") \
            .option("password", "xxxxxxxxx") \
            .option("dbtable", "User_Table") \
            .option("batchsize", "20000") \
            .option("truncate", "true") \
            .save()

I have tried with com.github.housepower:clickhouse-native-jdbc but It is also giving the same issue.

Can anyone help me on this ?

Sep 23 '22 07:09 swarupsarangi113

you can add this JdbcDialects.registerDialect(ClickHouseDialect) before write to clickhouse.. import org.apache.spark.sql.jdbc.{ClickHouseDialect, JdbcDialects} ClickHouseDialect comes from <groupId>com.github.housepower</groupId> <artifactId>clickhouse-integration-spark_${scala.binary.version}</artifactId>

Oct 11 '22 09:10 shaokunW-hotstar

I tried adding this to my pyspark code, however I couldn't get it right. Can you please help me on this ?

       from py4j.java_gateway import java_import


        gw = self.spark.sparkContext._gateway
        java_import(gw.jvm, "com.github.housepower.clickhouse-integration-spark_2.12.ClickHouseDialect")
        gw.jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(ClickHouseDialect)

Oct 17 '22 12:10 swarupsarangi113

any update on this

Aug 07 '24 06:08 AbhishekOnedigital

I'm also faced this problem, is there any way to get around the problem?

Now we we proceed as follows:

Create tmp table in CH with string type column that must be array type
Write data to tmp table with string values (integers, concatenated with comma separator)
Run INSERT SQL query like this

INSERT INTO table
SELECT 
    col1, 
    col2, 
    ..., 
    IF(
            string_column != '', -- if string_column value is not empty string
            arrayMap(
                x -> toUInt32(x), 
                splitByChar(
                    ',', -- split by comma
                    string_column
                )
            ),
            []
        ) AS array_column
FROM table_tmp
WHERE cond

Oct 08 '24 06:10 GolikovAndrey

Is there any update on this? I'm still experiencing issues while trying to import data into ClickHouse (v22.2.2.1) using Horsepower. Here's the guide I'm following: https://housepower.github.io/ClickHouse-Native-JDBC/guide/spark_integration.html#integration-with-spark

Oct 22 '24 07:10 ttdpro98

Need to insert an Array of Integers into ClickHouse Table Column from Spark :: Can't get JDBC type for array<int>

OS

What I am doing ?

Error I am getting

Code Snippet