[SPARK-48511][SS] Remove TimeMode None from TransformWithState.
What changes were proposed in this pull request?
This PR removes the TimeMode None from supported time modes in TransformWithState operator. A structured streaming query works in either Processing time mode, or event Time mode depending on whether eventTime has been specified, hence this change aligns TimeMode properly with Streaming query semantics.
Note that if eventTimeColumn is specified for output dataset in transformWithState operator. TimeMode defaults to EventTime.
Why are the changes needed?
These changes are needed to align TimeMode values with how time flows in Streaming query.
Does this PR introduce any user-facing change?
Yes, modifies the TimeMode semantic for transformWithState.
How was this patch tested?
All existing unit test pass.
Was this patch authored or co-authored using generative AI tooling?
No
Mind filing a JIRA please?
@HeartSaVioR PTAL, thanks.
Also let's revisit the UX. If they use neither timeout nor TTL, they could just do None regardless they have event time column or not. Given we remove None, what is the expectation of UX? Does user have to specify either one based on what they have (event time column is set or not), even though they never use timeout? If that is the case, that doesn't sound like a better UX.
Also let's revisit the UX. If they use neither timeout nor TTL, they could just do None regardless they have event time column or not. Given we remove None, what is the expectation of UX? Does user have to specify either one based on what they have (event time column is set or not), even though they never use timeout? If that is the case, that doesn't sound like a better UX.
Even with TimeMode None -> the query has eventTime (if eventTimeColumn is specified), and processingTime (based on driver's clock). I think this makes TimeMode None confusing. As a general rule of thumb, customer should set TimeMode as EventTime if eventTimeColumn is specified, else use ProcessingTime.
Just leaving a history here: currently processing time mode triggers the batch continuously which is not acceptable if there is no timer/TTL to check. We are now discussing how to handle the case.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!