Caused by: com.github.housepower.exception.ClickHouseSQLException: DB::ExceptionDB::Exception: Unexpected packet Data received from client.

Open swarupsarangi113 opened this issue 3 years ago • 0 comments

I am trying to load data from S3 which has arround 60 mil records using pySpark and I am getting the below error.

Environment

OS version: AWS EMR 6.3.0
ClickHouse Server version: 21.8.10
ClickHouse Native JDBC version: 2.6.5
(Optional) Spark version: version 3.1.1-amzn-0

Error logs

Caused by: com.github.housepower.exception.ClickHouseSQLException: DB::ExceptionDB::Exception: Unexpected packet Data received from client. Stack trace:

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8fd5d9a in /usr/bin/clickhouse
1. DB::TCPHandler::receiveUnexpectedData() @ 0x1103b729 in /usr/bin/clickhouse
2. DB::TCPHandler::receivePacket() @ 0x11031232 in /usr/bin/clickhouse
3. DB::TCPHandler::runImpl() @ 0x110298ea in /usr/bin/clickhouse
4. DB::TCPHandler::run() @ 0x1103cd39 in /usr/bin/clickhouse
5. Poco::Net::TCPServerConnection::start() @ 0x13bb428f in /usr/bin/clickhouse
6. Poco::Net::TCPServerDispatcher::run() @ 0x13bb5d1a in /usr/bin/clickhouse
7. Poco::PooledThread::run() @ 0x13ce7f99 in /usr/bin/clickhouse
8. Poco::ThreadImpl::runnableEntry(void*) @ 0x13ce422a in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so

        at com.github.housepower.protocol.ExceptionResponse.readExceptionFrom(ExceptionResponse.java:36)
        at com.github.housepower.protocol.Response.readFrom(Response.java:35)
        at com.github.housepower.client.NativeClient.receiveResponse(NativeClient.java:183)
        at com.github.housepower.client.NativeClient.receiveEndOfStream(NativeClient.java:134)
        at com.github.housepower.jdbc.ClickHouseConnection.sendInsertRequest(ClickHouseConnection.java:294)
        at com.github.housepower.jdbc.statement.ClickHousePreparedInsertStatement.close(ClickHousePreparedInsertStatement.java:167)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:695)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)
        at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
        at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)

Code Snippet

    def load_into_clickhouse(self, dataframe, dest_db_name, dest_table_name):
        dataframe.write \
            .format("jdbc") \
            .mode("overwrite") \
            .option("driver", "com.github.housepower.jdbc.ClickHouseDriver") \
            .option("url", self.url) \
            .option("createTableOptions", "engine=MergeTree()"
                                          "primary key (jobid)"
                                          "order by (jobid, toYYYYMM(created_dt), toYYYYMM(modified_dt))"
                                          "partition by (stages, toYYYYMM(created_dt))"
                    ) \
            .option("user", self.user_name) \
            .option("password", self.password) \
            .option("dbtable", f"{dest_db_name}.{dest_table_name}") \
            .option("batchsize", "5000") \
            .option("truncate", "true") \
            .save()

I find this weird but when I am using single column in PARTITION BY then I am not getting this error, however when I am using multiple columns as partition, it is throwing me this error.

Can anyone help me on this?

Oct 13 '22 06:10 swarupsarangi113