clickhouse-java
clickhouse-java copied to clipboard
Spark JDBC: DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime
Describe the bug
I'm trying to query data from Clickhouse using Spark jdbc connector.
I'm using some filters on timestamps. As a result I'm getting exception.
Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH)
Steps to reproduce
- Create clickhouse tables
- Run following Spark code
Expected behaviour
- Query run successfully
Code example
from pyspark.sql import SparkSession
# Set up the SparkSession to include ClickHouse as a custom catalog
spark = SparkSession.builder \
.appName("ClickHouse Catalog Example") \
.config("spark.jars.packages", "com.clickhouse:clickhouse-jdbc:0.6.5,org.apache.httpcomponents.client5:httpclient5:5.3.1") \
.config("spark.sql.catalog.clickhouse", "com.clickhouse.spark.ClickHouseCatalog") \
.config("spark.sql.catalog.clickhouse.host", "clickhouse-kim.clickhouse.svc") \
.config("spark.sql.catalog.clickhouse.http_port", "8123") \
.config("spark.sql.catalog.clickhouse.database", "telemetry") \
.config("spark.sql.catalog.clickhouse.driver", "com.clickhouse.jdbc.ClickHouseDriver") \
.config("spark.sql.catalog.clickhouse.user", "admin") \
.config("spark.sql.catalog.clickhouse.password", "admin") \
.getOrCreate()
jdbc_df = spark.read.jdbc("jdbc:clickhouse://clickhouse-kim.clickhouse.svc:8123/telemetry", "telemetry.reference_peaks", properties={"user": "admin", "password": "admin", "driver": "com.clickhouse.jdbc.ClickHouseDriver"})
jdbc_df.createOrReplaceTempView("reference_peaks")
# Parameters ts_start and ts_end (example values)
ts_start = '2024-09-10 23:00:00'
ts_end = '2024-09-10 23:04:59'
# Spark SQL query
query = f"""
SELECT *
FROM reference_peaks
WHERE tts > to_timestamp('{ts_start}') - INTERVAL 100 SECOND
AND tts < to_timestamp('{ts_end}') + INTERVAL 100 SECOND
AND ts >= to_timestamp('{ts_start}')
AND ts < to_timestamp('{ts_end}')
"""
# Execute the query using Spark SQL
result_df = spark.sql(query)
# Show the dataframe (for verification)
result_df.count()
Error log
24[/10/04](http://localhost:8888/10/04) 10:48:51 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.sql.BatchUpdateException: Code: 53. DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH) (version 24.9.1.3278 (official build))
at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107)
at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeAny(SqlBasedPreparedStatement.java:223)
at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeQuery(SqlBasedPreparedStatement.java:286)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:275)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.base[/java.util.concurrent.ThreadPoolExecutor.runWorker](http://localhost:8888/java.util.concurrent.ThreadPoolExecutor.runWorker)(ThreadPoolExecutor.java:1136)
at java.base[/java.util.concurrent.ThreadPoolExecutor](http://localhost:8888/java.util.concurrent.ThreadPoolExecutor)$Worker.run(ThreadPoolExecutor.java:635)
at java.base[/java.lang.Thread.run](http://localhost:8888/java.lang.Thread.run)(Thread.java:840)
24[/10/04](http://localhost:8888/10/04) 10:48:51 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.135 executor driver): java.sql.BatchUpdateException: Code: 53. DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH) (version 24.9.1.3278 (official build))
Configuration
Environment
- Client version: 0.6.5
- OS: MacOS
ClickHouse server
- ClickHouse Server version: 24.9.1.3278
CREATE TABLEstatements for tables involved:
CREATE TABLE telemetry.reference_peaks_shard ON CLUSTER '{cluster}'
(
hash_1 String,
hash_2 String,
ts DateTime,
offset Int32,
channel Int32,
station_id String,
tts DateTime,
batchid Int32,
org_ts DateTime
)
ENGINE = MergeTree
ORDER BY ts;
CREATE TABLE telemetry.reference_peaks ON CLUSTER '{cluster}' AS telemetry.reference_peaks_shard
ENGINE = Distributed('{cluster}', 'telemetry', 'reference_peaks_shard', channel);
I have the same issue
Environment Client version: 0.8.2 OS: Linux (kubernetes) ARM node
ClickHouse server ClickHouse Server version: 24.8.0 select query: SELECT id ... FROM ... WHERE create_time >= '2025-03-18 16:18:43.727';