beam icon indicating copy to clipboard operation
beam copied to clipboard

[#30789] Add support for Flink 1.18

Open thebozzcl opened this issue 10 months ago • 6 comments

This PR tries to solve #30789 by adding a runner for Flink 1.18. So far I haven't identified any changes that need to be done beyond the version bump.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [X] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [x] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

thebozzcl avatar Apr 19 '24 22:04 thebozzcl

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

github-actions[bot] avatar Apr 20 '24 00:04 github-actions[bot]

As I thought, seems there's a flaky test: https://github.com/apache/beam/actions/runs/8760406645/job/24045305260

 --- FAIL: TestElementChan (0.00s)
    --- FAIL: TestElementChan/FillBufferThenAbortThenRead (0.00s)
        datamgr_test.go:412: got sum 5, count 5, want sum 20, count 20
FAIL

I'm gonna take a look at it to see if I can identify what caused the previous run to fail. That being said, Go is not my forte.

thebozzcl avatar Apr 22 '24 19:04 thebozzcl

Thanks for the work! Yeah the flaky go SDK test is irrelevant

Abacn avatar Apr 22 '24 19:04 Abacn

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label go. R: @Abacn for label build.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar Apr 30 '24 08:04 github-actions[bot]

R: @je-ik

je-ik avatar Apr 30 '24 08:04 je-ik

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

github-actions[bot] avatar Apr 30 '24 08:04 github-actions[bot]

Looks like this is all green and ready to merge. Can you please squash the commits to single commit? Thanks. :+1:

je-ik avatar May 07 '24 08:05 je-ik

Actually committer can "squash and merge" using GitHub website. No further action needed for PR author.

Abacn avatar May 07 '24 14:05 Abacn

@Abacn Cool, I was not aware of that :+1: @thebozzcl Thanks!

je-ik avatar May 07 '24 14:05 je-ik

@Abacn nice, I wasn't aware of that either! @je-ik thanks for taking care of it!

thebozzcl avatar May 07 '24 16:05 thebozzcl

The task :runners:flink:1.18:runQuickstartJavaFlinkLocal has failed to run due to this commit, which leads to the recent failure of our github action "https://github.com/apache/beam/actions/workflows/beam_PostRelease_NightlySnapshot.yml".

Below is the latest error from the log:

2024-05-10T16:28:07.4342882Z Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionConfig
2024-05-10T16:28:07.4344273Z 	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
2024-05-10T16:28:07.4345556Z 	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
2024-05-10T16:28:07.4346644Z 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
2024-05-10T16:28:07.4347805Z 	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
2024-05-10T16:28:07.4348942Z 	at java.lang.Class.forName0(Native Method)
2024-05-10T16:28:07.4349661Z 	at java.lang.Class.forName(Class.java:348)
2024-05-10T16:28:07.4350979Z 	at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78)
2024-05-10T16:28:07.4352850Z 	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1988)
2024-05-10T16:28:07.4354599Z 	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
2024-05-10T16:28:07.4356870Z 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
2024-05-10T16:28:07.4358955Z 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
2024-05-10T16:28:07.4360298Z 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
2024-05-10T16:28:07.4362035Z 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
2024-05-10T16:28:07.4364329Z 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:539)
2024-05-10T16:28:07.4367296Z 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:527)
2024-05-10T16:28:07.4368928Z 	at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:67)
2024-05-10T16:28:07.4370661Z 	at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:101)
2024-05-10T16:28:07.4373136Z 	at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:122)
2024-05-10T16:28:07.4375581Z 	at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:379)
2024-05-10T16:28:07.4376991Z 	at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:356)
2024-05-10T16:28:07.4379145Z 	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128)
2024-05-10T16:28:07.4382049Z 	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100)
2024-05-10T16:28:07.4384288Z 	at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
2024-05-10T16:28:07.4385674Z 	... 4 more

+@damccorm

shunping avatar May 12 '24 03:05 shunping

Hmm. I'm gonna try looking into it, but I have limited connectivity at the moment and might not be able to do much until later this week. What's the preferred way to deal with this? Do we want to roll back?

thebozzcl avatar May 12 '24 17:05 thebozzcl

I can confirm the failure locally, we should revert this until we can make the check pass.

https://github.com/apache/beam/pull/31262

je-ik avatar May 13 '24 06:05 je-ik

Correction. The test locally fails even before this commit (e.g. tested 49056fd7ed111c4c9ebb9236ab447d39f0aa86f3 with 1.17). The check seems to have independent issues.

je-ik avatar May 13 '24 07:05 je-ik

I took a "quick" look (it took a while to dl all the dependencies in a connection that's not much better than dial-up). I'm kind of baffled by this issue - I checked the flink-core library and org.apache.flink.api.common.ExecutionConfig hasn't been moved around at all since 1.14.

We should probably create a separate ticket for this.

thebozzcl avatar May 13 '24 18:05 thebozzcl

@je-ik PostRelease test is different from other tests. It downloads the latest Nightly Snapshot and run some validation pipelines. It does not build beam on the current branch, and that is why you see "The test locally fails even before this commit".

Abacn avatar May 13 '24 18:05 Abacn

@je-ik PostRelease test is different from other tests. It downloads the latest Nightly Snapshot and run some validation pipelines. It does not build beam on the current branch, and that is why you see "The test locally fails even before this commit".

@je-ik, @Abacn: Given the task :runners:flink:1.18:runQuickstartJavaFlinkLocal is the failed task and it was introduced by this commit, can we revert it and see if the test is fixed first? We can look into other test issues if they emerge after that.

shunping avatar May 13 '24 18:05 shunping

Looks like someone have reported this issue on Flink 1.18 before.

shunping avatar May 13 '24 18:05 shunping

I agree revert at the change moment

:runners:flink:1.17:runQuickstartJavaFlinkLocal passed on Beam 2.56.0:

  • https://github.com/apache/beam/actions/workflows/beam_PostRelease_NightlySnapshot.yml

:runners:flink:1.17:runQuickstartJavaFlinkLocal (as well as :1.18:runQuickstartJavaFlinkLocal) failed on master:

  • https://github.com/apache/beam/actions/runs/9068053305

This means current master branch is failing Flink runner validation for both Flink 1.18 and under 1.18, possibly caused by this PR.

If we revert the change and wait for another nightly snapshot built, and the PostRelease test is back green, we can confirm

Abacn avatar May 13 '24 19:05 Abacn

if revert, there is another PR need to be reverted at the same time : #31217

This one included a typo fix which shouldn't be reverted : https://github.com/apache/beam/pull/31217/files#diff-07d5f4d101410e8cf42b7241fe72074d3c8003866e2b077f51388e1283ebe442R337

Abacn avatar May 13 '24 19:05 Abacn

Checking this PR again, it does not include non-trivial change. I doubt that revert it will make the flinkQuickstart back green again though

Abacn avatar May 13 '24 19:05 Abacn

Just for posterity and everyone else getting notifications. As release manager I learned just now that we need a tickets such as this whenever we add a new version: https://issues.apache.org/jira/browse/INFRA-25880

These can take a few days to get resolved, so it is good to do it right away as part of adding the new version support.

kennknowles avatar Jun 18 '24 20:06 kennknowles

A proposal to use single dockerhub repository for Flink job server container: #31631

Abacn avatar Jun 18 '24 23:06 Abacn