new version JDBC TimeZone Error
Description
I am using JDBC driver 0.8.4 Spark3.5.1.
Write ClickHouse using Spark JDBC, Date type data, automatically converted time zones
My region is Singapore, spark job is configured as UTC, and Clickhouse time zone is also UTC.
However, after my data was converted to date type using the toDate function in Spark and written to Clickhouse,
The date type automatically treats Spark date data as Singapore time and then converts it again to UTC.
Actually, my Spark job is already UTC time.
JDBC does not require any additional parameter configuration.
Finally, I switched to JDBC version 0.2.6, and the data is now normal.
Environment
- [ ] Cloud
- Client version: 0.8.4
- Language version:Java11
- OS: Linux
ClickHouse Server
- ClickHouse Server version: 23.8
Good day, @Faceisme ! Thank you for reporting the issue!
I have a few questions:
- Do you mean
java.sql.Dateorjava.util.Date? - what is the date field definition? does it have a timezone?
- What if the data flow? Is it correct that Spark writes data in UTC and CH has UTC date, but when it is read by application in Singapore via JDBC then JDBC treats data as Singapore and converts to UTC? If so - would you please give a code example of how do you read data?
Thanks!
Good day, @Faceisme ! Thank you for reporting the issue!
I have a few questions:
- Do you mean
java.sql.Dateorjava.util.Date?- what is the date field definition? does it have a timezone?
- What if the data flow? Is it correct that Spark writes data in UTC and CH has UTC date, but when it is read by application in Singapore via JDBC then JDBC treats data as Singapore and converts to UTC? If so - would you please give a code example of how do you read data?
Thanks!
sparkSQL:
select
to_date(cast( dt as string), 'yyyyMMdd') as date
from
table
Originally an int data type, it is cast to a string using the cast function, then converted to a Date type using Spark SQL's built-in functions. All data types are Spark SQL's built-in types, and the implementation follows Spark's native methods.
Then use the built-in method to write to CH.
df.write .format("jdbc") .options(finalOptions) .mode(mode) .save()
I am currently using version 0.7.2 of the driver, which uses the correct time zone. This is also the last version that supports JDBC V1. Once you switch to a higher version using the JDBC V2 driver, the time zone will be incorrect.
The data contains two fields representing time: dt and hour, UTC:
hour 2025080306 dt 20250803
Expected conversion result:
date is ‘2025-08-03’
However, after writing to ClickHouse, The data will be subtracted by 8 hours from 2025080306, then it becomes ‘2025-08-02’
@Faceisme thank you for the response!
I think, it should be enough to reproduce the problem and fix.