beam
beam copied to clipboard
[#30789] Add support for Flink 1.18
This PR tries to solve #30789 by adding a runner for Flink 1.18. So far I haven't identified any changes that need to be done beyond the version bump.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
- [X] Mention the appropriate issue in your description (for example:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead. - [x] Update
CHANGES.md
with noteworthy changes. - [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.
See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers
As I thought, seems there's a flaky test: https://github.com/apache/beam/actions/runs/8760406645/job/24045305260
--- FAIL: TestElementChan (0.00s)
--- FAIL: TestElementChan/FillBufferThenAbortThenRead (0.00s)
datamgr_test.go:412: got sum 5, count 5, want sum 20, count 20
FAIL
I'm gonna take a look at it to see if I can identify what caused the previous run to fail. That being said, Go is not my forte.
Thanks for the work! Yeah the flaky go SDK test is irrelevant
Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer
:
R: @jrmccluskey for label go. R: @Abacn for label build.
Available commands:
-
stop reviewer notifications
- opt out of the automated review tooling -
remind me after tests pass
- tag the comment author after tests pass -
waiting on author
- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
R: @je-ik
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control
Looks like this is all green and ready to merge. Can you please squash the commits to single commit? Thanks. :+1:
Actually committer can "squash and merge" using GitHub website. No further action needed for PR author.
@Abacn Cool, I was not aware of that :+1: @thebozzcl Thanks!
@Abacn nice, I wasn't aware of that either! @je-ik thanks for taking care of it!
The task :runners:flink:1.18:runQuickstartJavaFlinkLocal
has failed to run due to this commit, which leads to the recent failure of our github action "https://github.com/apache/beam/actions/workflows/beam_PostRelease_NightlySnapshot.yml".
Below is the latest error from the log:
2024-05-10T16:28:07.4342882Z Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionConfig
2024-05-10T16:28:07.4344273Z at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
2024-05-10T16:28:07.4345556Z at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
2024-05-10T16:28:07.4346644Z at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
2024-05-10T16:28:07.4347805Z at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
2024-05-10T16:28:07.4348942Z at java.lang.Class.forName0(Native Method)
2024-05-10T16:28:07.4349661Z at java.lang.Class.forName(Class.java:348)
2024-05-10T16:28:07.4350979Z at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78)
2024-05-10T16:28:07.4352850Z at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1988)
2024-05-10T16:28:07.4354599Z at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
2024-05-10T16:28:07.4356870Z at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
2024-05-10T16:28:07.4358955Z at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
2024-05-10T16:28:07.4360298Z at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
2024-05-10T16:28:07.4362035Z at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
2024-05-10T16:28:07.4364329Z at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:539)
2024-05-10T16:28:07.4367296Z at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:527)
2024-05-10T16:28:07.4368928Z at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:67)
2024-05-10T16:28:07.4370661Z at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:101)
2024-05-10T16:28:07.4373136Z at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:122)
2024-05-10T16:28:07.4375581Z at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:379)
2024-05-10T16:28:07.4376991Z at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:356)
2024-05-10T16:28:07.4379145Z at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128)
2024-05-10T16:28:07.4382049Z at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100)
2024-05-10T16:28:07.4384288Z at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
2024-05-10T16:28:07.4385674Z ... 4 more
+@damccorm
Hmm. I'm gonna try looking into it, but I have limited connectivity at the moment and might not be able to do much until later this week. What's the preferred way to deal with this? Do we want to roll back?
I can confirm the failure locally, we should revert this until we can make the check pass.
https://github.com/apache/beam/pull/31262
Correction. The test locally fails even before this commit (e.g. tested 49056fd7ed111c4c9ebb9236ab447d39f0aa86f3
with 1.17). The check seems to have independent issues.
I took a "quick" look (it took a while to dl all the dependencies in a connection that's not much better than dial-up). I'm kind of baffled by this issue - I checked the flink-core
library and org.apache.flink.api.common.ExecutionConfig
hasn't been moved around at all since 1.14.
We should probably create a separate ticket for this.
@je-ik PostRelease test is different from other tests. It downloads the latest Nightly Snapshot and run some validation pipelines. It does not build beam on the current branch, and that is why you see "The test locally fails even before this commit".
@je-ik PostRelease test is different from other tests. It downloads the latest Nightly Snapshot and run some validation pipelines. It does not build beam on the current branch, and that is why you see "The test locally fails even before this commit".
@je-ik, @Abacn: Given the task :runners:flink:1.18:runQuickstartJavaFlinkLocal
is the failed task and it was introduced by this commit, can we revert it and see if the test is fixed first? We can look into other test issues if they emerge after that.
Looks like someone have reported this issue on Flink 1.18 before.
- https://www.mail-archive.com/[email protected]/msg52035.html
- https://www.mail-archive.com/[email protected]/msg52467.html
I agree revert at the change moment
:runners:flink:1.17:runQuickstartJavaFlinkLocal
passed on Beam 2.56.0:
- https://github.com/apache/beam/actions/workflows/beam_PostRelease_NightlySnapshot.yml
:runners:flink:1.17:runQuickstartJavaFlinkLocal
(as well as :1.18:runQuickstartJavaFlinkLocal
) failed on master:
- https://github.com/apache/beam/actions/runs/9068053305
This means current master branch is failing Flink runner validation for both Flink 1.18 and under 1.18, possibly caused by this PR.
If we revert the change and wait for another nightly snapshot built, and the PostRelease test is back green, we can confirm
if revert, there is another PR need to be reverted at the same time : #31217
This one included a typo fix which shouldn't be reverted : https://github.com/apache/beam/pull/31217/files#diff-07d5f4d101410e8cf42b7241fe72074d3c8003866e2b077f51388e1283ebe442R337
Checking this PR again, it does not include non-trivial change. I doubt that revert it will make the flinkQuickstart back green again though
Just for posterity and everyone else getting notifications. As release manager I learned just now that we need a tickets such as this whenever we add a new version: https://issues.apache.org/jira/browse/INFRA-25880
These can take a few days to get resolved, so it is good to do it right away as part of adding the new version support.
A proposal to use single dockerhub repository for Flink job server container: #31631