flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-36663][Window]Fix the first processWatermark has extra data after restore by restore timeService's watermark.

Open xing1mo opened this issue 1 year ago • 7 comments

What is the purpose of the change

Restore the currentWatermark of WindowTimerService. If not restored, the slidingWindow will output extra data.

Brief change log

Restore the currentWatermark of WindowTimerService.

Verifying this change

This change added tests and can be verified as follows:

  • Added test that validates whether the hopping window restores the output data consistently.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

xing1mo avatar Nov 11 '24 06:11 xing1mo

CI report:

  • 373adb9fe0cf38ecbc2a842bc684aa0da3c6bc43 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Nov 11 '24 06:11 flinkbot

@xing1mo The Azure tests are failing in testCheckpointRescalingInKeyedState.

davidradl avatar Nov 29 '24 11:11 davidradl

Reviewed by Chi on 05/12/24. Asked submitter questions

davidradl avatar Dec 05 '24 14:12 davidradl

@xing1mo The Azure tests are failing in testCheckpointRescalingInKeyedState. This is a known problem and there is a certain probability that it will occur. https://issues.apache.org/jira/browse/FLINK-36613

xing1mo avatar Dec 06 '24 13:12 xing1mo

@flinkbot run azure

xing1mo avatar Dec 06 '24 13:12 xing1mo

@flinkbot run azure

xing1mo avatar Jan 16 '25 03:01 xing1mo

@flinkbot run azure

xing1mo avatar Jan 20 '25 06:01 xing1mo

I'm wondering if other window operators have this bug like window join, window rank, window dedup...

截屏2025-07-04 18 48 02

Not necessarily. The problem with windowAgg is that the currentWatermark in the timer service is used to determine the window file during agg. I don't know much about the logic of other operators.

xing1mo avatar Jul 04 '25 10:07 xing1mo

Not necessarily. The problem with windowAgg is that the currentWatermark in the timer service is used to determine the window file during agg. I don't know much about the logic of other operators.

OK, I have created a separate jira for it https://issues.apache.org/jira/browse/FLINK-38055

xuyangzhong avatar Jul 07 '25 10:07 xuyangzhong

@xing1mo Could you please bp this fix to 2.1 and 2.0?

xuyangzhong avatar Jul 08 '25 07:07 xuyangzhong