flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-28829][k8s] Support prepreparing K8S resources before JM creation

Open bzhaoopenstack opened this issue 3 years ago • 1 comments

In this PR, we introduces several things for supporting create another k8s resources before JM deploy creation

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).) For supporting customized k8s scheduler, we need to extend the ability of current code tree to support the cases of all customized k8s scheduler during they schedule the real pods. So we'd better to extend all cases during k8s scheduling, such as before/during/after our target resources creation. In this PR, we add the before case. And current code tree had covered during/after cases.

Brief change log

  1. We introduce a new interface func for all decorators to support create pre-prepared k8s resources.
  2. We extend a attribute for JM spec for storing the resources list.
  3. Extending the ability for supporting refresh the preprepared resource's ownerreference based on original code logic.
  4. Add the ability for creating K8S resource before JM deployment creation in Fabric client.

Verifying this change

We can not verify this change at this moment, as this is a internal process in flink-kubernetes now. It doesn't affect the user interface now. The only way we can test it is mocking a JM spec with another k8s resources which is not contained by Flink. And check whether the resources can be created before the JM and refresh the correct ower reference(Flink cluster id-- deployment id).

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not documented)

bzhaoopenstack avatar Aug 08 '22 11:08 bzhaoopenstack

CI report:

  • e9939ecf94a99d71a40dc9f21b894e65121544b1 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Aug 08 '22 12:08 flinkbot

@flinkbot run azure

bzhaoopenstack avatar Oct 11 '22 00:10 bzhaoopenstack

Hi! As you probably know, this PR is introducing some breaking changes. Our academic tool BreakBot helps developers to better assess the impact that these BCs may have on client projects. While this can be on purpose, we found out that some clients among the most popular on GitHub may be faced with broken code. You can find the full BreakBot report in our fork repository: report for PR.

We hope this information is valuable to you, and apologize otherwise. If you're willing to help, we would kindly ask for your help to fill in a 5-minutes survey about the report. Your feedback will help us improve the tool and help us in our research!

jrfaller avatar Nov 22 '22 14:11 jrfaller