[SPARK-54609][SQL] Disable TIME type by default
What changes were proposed in this pull request?
Introducing a new SQL config for TIME type: spark.sql.timeType.enabled.
The default value is false and it is enabled only in tests.
Why are the changes needed?
TIME data type support is not complete, so we need to guard it before it is completed, especially ahead of Spark 4.1 release.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Need to add tests for disabled config.
Was this patch authored or co-authored using generative AI tooling?
cc @MaxGekk , @uros-db , @cloud-fan , @HyukjinKwon , @viirya , @peter-toth , @yaooqinn , @LuciferYang , @vinodkc
Could you answer the above comments and make this PR pass the CIs for further discussion, @davidm-db ?
TimeType is marked as Unstable. Is this short-term prohibition actually required?
Yes, we need this to protect the users from the accidental use of unfinished work, @yaooqinn .
TimeType is marked as Unstable. Is this short-term prohibition actually required?
can we add some test cases following the geo types blocking PRs?
BTW, given that @yaooqinn 's comment, while working on this PR, we need to build a consensus on this by sending out an email to dev@spark mailing list, @davidm-db , @uros-db , and @cloud-fan .
It would be enough to reply on RC2 email about the TIME type. Maybe, could you send out the decision clearly to the mailing list, please, @cloud-fan , because @MaxGekk is not in the loop yet?
@dongjoon-hyun can you clarify? what is the decision that you would like to make? is it whether or not the flag is required in the first place or something else?
To @davidm-db , it's about removing the whole TIME type feature from Apache Spark 4.1.0 release note in order to make this as 4.2.0 feature officially. This is a part of that activity, isn't it?
The flag is only one way to do it. There are many alternatives like reverting all TIME-related commits in the worst case. flag is the best option because it still allows accessibility for testing.
@dongjoon-hyun @cloud-fan I think I've resolved all of the comments. I'll make sure tonight that after my latest changes all CIs are passing.
Tomorrow, I'll add some negative tests as Wenchen suggested (i.e. TIME really doesn't work when flag is disabled) and I think that should be it for this PR.
Thank you, @davidm-db .
I verified manually that it's blocked properly.
scala> spark.sql("CREATE TABLE t(c TIME)")
org.apache.spark.sql.catalyst.parser.ParseException:
[UNSUPPORTED_DATATYPE] Unsupported data type "TIME". SQLSTATE: 0A000
== SQL (line 1, position 18) ==
CREATE TABLE t(c TIME)
^^^^
Given that, shall we proceed those test case as a follow-up, @davidm-db and @cloud-fan ?
I removed all client side checks as it's not very meaningful. People can use the TimeType class directly and there is no point to only block SQL. People can use their own Spark Connect client which is out of our control. I think the server side checks should be sufficient: we block time related functions, and we block query result with time type (either collect the rows or write to data sources). This is also how we block geo types.
So, is this the final status from your side, @cloud-fan ? Then, could you give your approval?
Could you cancel all previously launched the GitHub Action CI in order to make run the last commit? It seems that the last commit didn't get the resource yet.
I manually verified the compilation and Scalalinter and both on and off behavior manually.
scala> sql("create table t(a TIME)").show()
org.apache.spark.sql.AnalysisException: [UNSUPPORTED_TIME_TYPE] The data type TIME is not supported. SQLSTATE: 0A000
Merged to master.
~Could you make a backporting PR, @davidm-db and @cloud-fan ? There exist conflicts in branch-4.1.~
Never mind. I resolved the conflicts and am testing locally on branch-4.1 now.
Merged to branch-4.1, too.