UnknownException: (org.apache.spark.SparkSQLException) [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: UInt64, id: OTHER. SQLSTATE: 42704
Description
The Spark JDBC V2 data source fails to recognize the UInt64 data type from ClickHouse, resulting in a SparkSQLException. This prevents reading tables that contain UInt64 columns. The issue appears in clickhouse-jdbc driver versions newer than 0.8.2.
[(https://github.com/ClickHouse/clickhouse-java/issues/1042)] - it looks like a similar issue to this one, but it occurs for the UInt* types.
Steps to reproduce
Set up an Apache Spark 4.0 cluster. In a ClickHouse database (version 25.4.3.22), create a table containing a UInt64 column. Attempt to read from this table using Apache Spark's JDBC data source with clickhouse-jdbc driver version 0.9.0.
Error Log or Exception StackTrace
UnknownException: (org.apache.spark.SparkSQLException) [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: UInt64, id: OTHER. SQLSTATE: 42704
JVM stacktrace:
org.apache.spark.SparkSQLException
at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:977)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:241)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:325)
at scala.Option.getOrElse(Option.scala:201)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:325)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.$anonfun$getQueryOutputSchema$3(JDBCRDD.scala:72)
at scala.util.Using$.resource(Using.scala:296)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.$anonfun$getQueryOutputSchema$2(JDBCRDD.scala:70)
at scala.util.Using$.resource(Using.scala:296)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.$anonfun$getQueryOutputSchema$1(JDBCRDD.scala:68)
at scala.util.Using$.resource(Using.scala:296)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:67)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:62)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:243)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:38)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:361)
at org.apache.spark.sql.catalyst.analysis.ResolveDataSource.org$apache$spark$sql$catalyst$analysis$ResolveDataSource$$loadV1BatchSource(ResolveDataSource.scala:143)
at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.$anonfun$applyOrElse$2(ResolveDataSource.scala:61)
at scala.Option.getOrElse(Option.scala:201)
at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.applyOrElse(ResolveDataSource.scala:61)
at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.applyOrElse(ResolveDataSource.scala:45)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:139)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:139)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:416)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:135)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:131)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:37)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:112)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:111)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:37)
at org.apache.spark.sql.catalyst.analysis.ResolveDataSource.apply(ResolveDataSource.scala:45)
at org.apache.spark.sql.catalyst.analysis.ResolveDataSource.apply(ResolveDataSource.scala:43)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:242)
at scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:183)
at scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:179)
at scala.collection.immutable.List.foldLeft(List.scala:79)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:239)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:231)
at scala.collection.immutable.List.foreach(List.scala:334)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:231)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:290)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:286)
at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:234)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:286)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:249)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:201)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:89)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:201)
at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.resolveInFixedPoint(HybridAnalyzer.scala:190)
at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.$anonfun$apply$1(HybridAnalyzer.scala:76)
at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.withTrackedAnalyzerBridgeState(HybridAnalyzer.scala:111)
at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.apply(HybridAnalyzer.scala:71)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:280)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:423)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:280)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyAnalyzed$2(QueryExecution.scala:110)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:148)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:278)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:654)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:278)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:277)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyAnalyzed$1(QueryExecution.scala:110)
at scala.util.Try$.apply(Try.scala:217)
at org.apache.spark.util.Utils$.doTryWithCallerStacktrace(Utils.scala:1378)
at org.apache.spark.util.Utils$.getTryWithCallerStacktrace(Utils.scala:1439)
at org.apache.spark.util.LazyTry.get(LazyTry.scala:58)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:121)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:80)
at org.apache.spark.sql.classic.Dataset$.$anonfun$ofRows$1(Dataset.scala:115)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.classic.Dataset$.ofRows(Dataset.scala:113)
at org.apache.spark.sql.classic.DataFrameReader.load(DataFrameReader.scala:109)
at org.apache.spark.sql.classic.DataFrameReader.load(DataFrameReader.scala:92)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformReadRel(SparkConnectPlanner.scala:1409)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformRelation$1(SparkConnectPlanner.scala:152)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$3(SessionHolder.scala:477)
at scala.Option.getOrElse(Option.scala:201)
at org.apache.spark.sql.connect.service.SessionHolder.usePlanCache(SessionHolder.scala:476)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:147)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:133)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformShowString(SparkConnectPlanner.scala:306)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformRelation$1(SparkConnectPlanner.scala:150)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$3(SessionHolder.scala:477)
at scala.Option.getOrElse(Option.scala:201)
at org.apache.spark.sql.connect.service.SessionHolder.usePlanCache(SessionHolder.scala:476)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:147)
at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.handlePlan(SparkConnectPlanExecution.scala:74)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.handlePlan(ExecuteThreadRunner.scala:314)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:225)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:196)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:341)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:341)
at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:186)
at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:102)
at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:340)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:196)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:125)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:347)
Expected Behaviour
Spark should successfully read the data from the ClickHouse table. The UInt64 ClickHouse type should be mapped to a compatible Spark SQL type, such as DecimalType(20, 0) or LongType if the values fit.
Code Example
ch_df = spark.read \
.format("jdbc") \
.option("url","jdbc:clickhouse://ip_address:port/spark_data") \
.option("dbtable","(SELECT * FROM spark_data.test_uint64_table) as source_tmp") \
.option("user","user") \
.option("password","password") \
.option("driver","com.clickhouse.jdbc.ClickHouseDriver") \
.load()
ch_df.show()
Configuration
Client Configuration
Environment
*[ ] *Client version: clickhouse-jdbc-0.9.0.jar *Language version: Apache Spark 4.0 (PySpark) *OS: Ubuntu 22.04
ClickHouse Server
- ClickHouse Server version: 25.4.3.22
- ClickHouse Server non-default settings, if any:None
CREATE TABLE spark_data.test_uint64_table
(
`id` UInt64,
`event_name` String
)
ENGINE = MergeTree()
ORDER BY id
INSERT INTO spark_data.test_uint64_table (id, event_name) VALUES (18446744073709551615, 'max_value_event');
INSERT INTO spark_data.test_uint64_table (id, event_name) VALUES (1, 'regular_event');
Any updates on this?
It's the same situation with other data types as well: Int128, Int256, UInt128, UInt256: pyspark.errors.exceptions.connect.UnknownException: (org.apache.spark.SparkSQLException) [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: Nullable(Int128), id: OTHER. SQLSTATE: 42704 pyspark.errors.exceptions.connect.UnknownException: (org.apache.spark.SparkSQLException) [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: Nullable(Int256), id: OTHER. SQLSTATE: 42704 pyspark.errors.exceptions.connect.UnknownException: (org.apache.spark.SparkSQLException) [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: Nullable(UInt128), id: OTHER. SQLSTATE: 42704 pyspark.errors.exceptions.connect.UnknownException: (org.apache.spark.SparkSQLException) [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: Nullable(UInt256), id: OTHER. SQLSTATE: 42704
same here: org.apache.spark.SparkSQLException: [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: UInt64, id: OTHER. SQLSTATE: 42704