spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-54609][SQL] Disable TIME type by default

Open davidm-db opened this issue 3 weeks ago • 10 comments

What changes were proposed in this pull request?

Introducing a new SQL config for TIME type: spark.sql.timeType.enabled.

The default value is false and it is enabled only in tests.

Why are the changes needed?

TIME data type support is not complete, so we need to guard it before it is completed, especially ahead of Spark 4.1 release.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Need to add tests for disabled config.

Was this patch authored or co-authored using generative AI tooling?

davidm-db avatar Dec 05 '25 14:12 davidm-db

cc @MaxGekk , @uros-db , @cloud-fan , @HyukjinKwon , @viirya , @peter-toth , @yaooqinn , @LuciferYang , @vinodkc

dongjoon-hyun avatar Dec 05 '25 17:12 dongjoon-hyun

Could you answer the above comments and make this PR pass the CIs for further discussion, @davidm-db ?

dongjoon-hyun avatar Dec 06 '25 23:12 dongjoon-hyun

TimeType is marked as Unstable. Is this short-term prohibition actually required?

yaooqinn avatar Dec 07 '25 06:12 yaooqinn

Yes, we need this to protect the users from the accidental use of unfinished work, @yaooqinn .

TimeType is marked as Unstable. Is this short-term prohibition actually required?

dongjoon-hyun avatar Dec 07 '25 18:12 dongjoon-hyun

can we add some test cases following the geo types blocking PRs?

cloud-fan avatar Dec 08 '25 15:12 cloud-fan

BTW, given that @yaooqinn 's comment, while working on this PR, we need to build a consensus on this by sending out an email to dev@spark mailing list, @davidm-db , @uros-db , and @cloud-fan .

It would be enough to reply on RC2 email about the TIME type. Maybe, could you send out the decision clearly to the mailing list, please, @cloud-fan , because @MaxGekk is not in the loop yet?

@dongjoon-hyun can you clarify? what is the decision that you would like to make? is it whether or not the flag is required in the first place or something else?

davidm-db avatar Dec 08 '25 23:12 davidm-db

To @davidm-db , it's about removing the whole TIME type feature from Apache Spark 4.1.0 release note in order to make this as 4.2.0 feature officially. This is a part of that activity, isn't it?

Screenshot 2025-12-08 at 16 00 07

dongjoon-hyun avatar Dec 09 '25 00:12 dongjoon-hyun

The flag is only one way to do it. There are many alternatives like reverting all TIME-related commits in the worst case. flag is the best option because it still allows accessibility for testing.

dongjoon-hyun avatar Dec 09 '25 00:12 dongjoon-hyun

@dongjoon-hyun @cloud-fan I think I've resolved all of the comments. I'll make sure tonight that after my latest changes all CIs are passing. Tomorrow, I'll add some negative tests as Wenchen suggested (i.e. TIME really doesn't work when flag is disabled) and I think that should be it for this PR.

davidm-db avatar Dec 09 '25 21:12 davidm-db

Thank you, @davidm-db .

dongjoon-hyun avatar Dec 09 '25 22:12 dongjoon-hyun

I verified manually that it's blocked properly.

scala> spark.sql("CREATE TABLE t(c TIME)")
org.apache.spark.sql.catalyst.parser.ParseException:
[UNSUPPORTED_DATATYPE] Unsupported data type "TIME". SQLSTATE: 0A000
== SQL (line 1, position 18) ==
CREATE TABLE t(c TIME)
                 ^^^^

Given that, shall we proceed those test case as a follow-up, @davidm-db and @cloud-fan ?

dongjoon-hyun avatar Dec 10 '25 23:12 dongjoon-hyun

I removed all client side checks as it's not very meaningful. People can use the TimeType class directly and there is no point to only block SQL. People can use their own Spark Connect client which is out of our control. I think the server side checks should be sufficient: we block time related functions, and we block query result with time type (either collect the rows or write to data sources). This is also how we block geo types.

cloud-fan avatar Dec 11 '25 01:12 cloud-fan

So, is this the final status from your side, @cloud-fan ? Then, could you give your approval?

dongjoon-hyun avatar Dec 11 '25 02:12 dongjoon-hyun

Could you cancel all previously launched the GitHub Action CI in order to make run the last commit? It seems that the last commit didn't get the resource yet.

Screenshot 2025-12-10 at 18 54 16

dongjoon-hyun avatar Dec 11 '25 02:12 dongjoon-hyun

I manually verified the compilation and Scalalinter and both on and off behavior manually.

scala> sql("create table t(a TIME)").show()
org.apache.spark.sql.AnalysisException: [UNSUPPORTED_TIME_TYPE] The data type TIME is not supported. SQLSTATE: 0A000

dongjoon-hyun avatar Dec 11 '25 03:12 dongjoon-hyun

Merged to master.

~Could you make a backporting PR, @davidm-db and @cloud-fan ? There exist conflicts in branch-4.1.~

Never mind. I resolved the conflicts and am testing locally on branch-4.1 now.

dongjoon-hyun avatar Dec 11 '25 03:12 dongjoon-hyun

Merged to branch-4.1, too.

dongjoon-hyun avatar Dec 11 '25 03:12 dongjoon-hyun