ClickHouse-Native-JDBC
ClickHouse-Native-JDBC copied to clipboard
Caused by: com.github.housepower.exception.ClickHouseSQLException: DB::ExceptionDB::Exception: Unexpected packet Data received from client.
I am trying to load data from S3 which has arround 60 mil records using pySpark and I am getting the below error.
Environment
- OS version: AWS EMR 6.3.0
- ClickHouse Server version: 21.8.10
- ClickHouse Native JDBC version: 2.6.5
- (Optional) Spark version: version 3.1.1-amzn-0
Error logs
Caused by: com.github.housepower.exception.ClickHouseSQLException: DB::ExceptionDB::Exception: Unexpected packet Data received from client. Stack trace:
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8fd5d9a in /usr/bin/clickhouse
1. DB::TCPHandler::receiveUnexpectedData() @ 0x1103b729 in /usr/bin/clickhouse
2. DB::TCPHandler::receivePacket() @ 0x11031232 in /usr/bin/clickhouse
3. DB::TCPHandler::runImpl() @ 0x110298ea in /usr/bin/clickhouse
4. DB::TCPHandler::run() @ 0x1103cd39 in /usr/bin/clickhouse
5. Poco::Net::TCPServerConnection::start() @ 0x13bb428f in /usr/bin/clickhouse
6. Poco::Net::TCPServerDispatcher::run() @ 0x13bb5d1a in /usr/bin/clickhouse
7. Poco::PooledThread::run() @ 0x13ce7f99 in /usr/bin/clickhouse
8. Poco::ThreadImpl::runnableEntry(void*) @ 0x13ce422a in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
at com.github.housepower.protocol.ExceptionResponse.readExceptionFrom(ExceptionResponse.java:36)
at com.github.housepower.protocol.Response.readFrom(Response.java:35)
at com.github.housepower.client.NativeClient.receiveResponse(NativeClient.java:183)
at com.github.housepower.client.NativeClient.receiveEndOfStream(NativeClient.java:134)
at com.github.housepower.jdbc.ClickHouseConnection.sendInsertRequest(ClickHouseConnection.java:294)
at com.github.housepower.jdbc.statement.ClickHousePreparedInsertStatement.close(ClickHousePreparedInsertStatement.java:167)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:695)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
Code Snippet
def load_into_clickhouse(self, dataframe, dest_db_name, dest_table_name):
dataframe.write \
.format("jdbc") \
.mode("overwrite") \
.option("driver", "com.github.housepower.jdbc.ClickHouseDriver") \
.option("url", self.url) \
.option("createTableOptions", "engine=MergeTree()"
"primary key (jobid)"
"order by (jobid, toYYYYMM(created_dt), toYYYYMM(modified_dt))"
"partition by (stages, toYYYYMM(created_dt))"
) \
.option("user", self.user_name) \
.option("password", self.password) \
.option("dbtable", f"{dest_db_name}.{dest_table_name}") \
.option("batchsize", "5000") \
.option("truncate", "true") \
.save()
I find this weird but when I am using single column in PARTITION BY then I am not getting this error, however when I am using multiple columns as partition, it is throwing me this error.
Can anyone help me on this?