clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

Need to insert an Array of Integers into ClickHouse Table Column from Spark :: Can't get JDBC type for array<int>

Open swarupsarangi113 opened this issue 3 years ago • 5 comments

OS

Windows 11 Spark 3.1.2 Python 3.7

What I am doing ?

I have a dataframe that contains an Array of Integers in a column. I am inserting this dataframe into ClickHouse using jar clickhouse-jdbc-0.3.2-patch11.jar and driver com.github.housepower.jdbc.ClickHouseDriver.

Error I am getting

Can't get JDBC type for array<int>

Code Snippet

source_df.write \
            .format("jdbc") \
            .mode("overwrite") \
            .option("driver", "com.github.housepower.jdbc.ClickHouseDriver") \
            .option("url", os.environ["clickhous_url"]) \
            .option("createTableOptions", "engine=MergeTree()"
                                          "primary key (UserID)"
                                          "order by (UserID)") \
            .option("user", "clickhouse_user") \
            .option("password", "xxxxxxxxx") \
            .option("dbtable", "User_Table") \
            .option("batchsize", "20000") \
            .option("truncate", "true") \
            .save()

I have tried with com.github.housepower:clickhouse-native-jdbc but It is also giving the same issue.

Can anyone help me on this ?

swarupsarangi113 avatar Sep 23 '22 07:09 swarupsarangi113

you can add this JdbcDialects.registerDialect(ClickHouseDialect) before write to clickhouse.. import org.apache.spark.sql.jdbc.{ClickHouseDialect, JdbcDialects} ClickHouseDialect comes from <groupId>com.github.housepower</groupId> <artifactId>clickhouse-integration-spark_${scala.binary.version}</artifactId>

shaokunW-hotstar avatar Oct 11 '22 09:10 shaokunW-hotstar

I tried adding this to my pyspark code, however I couldn't get it right. Can you please help me on this ?

       from py4j.java_gateway import java_import


        gw = self.spark.sparkContext._gateway
        java_import(gw.jvm, "com.github.housepower.clickhouse-integration-spark_2.12.ClickHouseDialect")
        gw.jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(ClickHouseDialect)

swarupsarangi113 avatar Oct 17 '22 12:10 swarupsarangi113

any update on this

AbhishekOnedigital avatar Aug 07 '24 06:08 AbhishekOnedigital

I'm also faced this problem, is there any way to get around the problem?

Now we we proceed as follows:

  1. Create tmp table in CH with string type column that must be array type
  2. Write data to tmp table with string values (integers, concatenated with comma separator)
  3. Run INSERT SQL query like this
INSERT INTO table
SELECT 
    col1, 
    col2, 
    ..., 
    IF(
            string_column != '', -- if string_column value is not empty string
            arrayMap(
                x -> toUInt32(x), 
                splitByChar(
                    ',', -- split by comma
                    string_column
                )
            ),
            []
        ) AS array_column
FROM table_tmp
WHERE cond

GolikovAndrey avatar Oct 08 '24 06:10 GolikovAndrey

Is there any update on this? I'm still experiencing issues while trying to import data into ClickHouse (v22.2.2.1) using Horsepower. Here's the guide I'm following: https://housepower.github.io/ClickHouse-Native-JDBC/guide/spark_integration.html#integration-with-spark

ttdpro98 avatar Oct 22 '24 07:10 ttdpro98