beam icon indicating copy to clipboard operation
beam copied to clipboard

[flink] #31390 emit watermark with empty source

Open je-ik opened this issue 1 year ago • 5 comments

Closes #31390


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

je-ik avatar May 24 '24 09:05 je-ik

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

github-actions[bot] avatar May 24 '24 14:05 github-actions[bot]

R: @Abacn

The PVR test seems to be stuck at ViewTest.testTriggeredLatestSingleton. I can observe this locally on both master and release-2.56.0 branches. Does this check completed successfully recently?

je-ik avatar May 24 '24 15:05 je-ik

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

github-actions[bot] avatar May 24 '24 15:05 github-actions[bot]

The PVR test seems to be stuck at ViewTest.testTriggeredLatestSingleton. I can observe this locally on both master and release-2.56.0 branches. Does this check completed successfully recently?

Hm, it passed on second run.

je-ik avatar May 24 '24 15:05 je-ik

Thanks, taking a look

At the same time, have a couple of questions (not directly related to the change)

  • This sounds similar to #30969, what is the difference here ?

  • I also observed similar issue on JmsIO on Dataflow runner ("watermark does not increase when there is no incoming data for a while") and the fix #30337 didn't work. I am wondering if #31390 is generic at SDK level and a fix could posed in general ?

Abacn avatar May 24 '24 18:05 Abacn

* This sounds similar to [[runners-flink] Fix watermark emission for empty splits (#29816) #30969](https://github.com/apache/beam/pull/30969), what is the difference here ?

The fix in #30969 was related, but different. Source can be empty temporarily or finally. The fact, that the source is empty for ever is signaled by watermark going to infinity. Then the split can be closed (and this results in watermark move, because closed split does not hold watermark anymore).

This PR fixes the other case - when the source is not emitting any data, but does not move watermark to infinity, but rather uses some idle source policy. Before this PR no watermark was emitted downstream until at least one element was emitted from the source. This is fixed now.

* I also observed similar issue on JmsIO on Dataflow runner  ("watermark does not increase when there is no incoming data for a while") and the fix [[DRAFT] Attempt fix Jms watermark #30337](https://github.com/apache/beam/pull/30337) didn't work. I am wondering if [[Bug]: FlinkRunner does not emit watermark with empty source #31390](https://github.com/apache/beam/issues/31390) is generic at SDK level and a fix could posed in general ?

All these fixes relate to Flink only. These issues were introduced by source refactoring in FlinkRunner, so nothing that can be extended to a general case.

je-ik avatar May 25 '24 06:05 je-ik