flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-28915] Flink Native k8s mode jar localtion support s3 schema.

Open SwimSweet opened this issue 2 years ago • 10 comments

What is the purpose of the change

Kerbernetes Native K8s Application Mode and StandAlone Application Mode support fetching jar from DFS schema(S3, OSS, HDFS, etc.).

Brief change log

  • Fetch jar from DFS(S3, OSS, HDFS, etc.) before starting flink cluster.

Verifying this change

This change added tests and can be verified as follows:

  • Remove testDeployApplicationClusterWithNonLocalSchema test
  • Added tests that fetch jar from http schema
  • Added tests that fetch jar from file schema
  • Added test that create emptyDir for saving user artifacts
  • Manually verify the local, file, oss, HDFS with Kerberos, S3 resource.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: Native Kubernetes Application Mode: yes
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs

SwimSweet avatar Sep 07 '22 17:09 SwimSweet

CI report:

  • c0efcfe43c465daf45fff83154d7469fbd1e4f93 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Sep 07 '22 17:09 flinkbot

I create a new pr based on master. Please help take a look when you are free @wangyang0918 @Aitozi . thx.

SwimSweet avatar Sep 08 '22 06:09 SwimSweet

@SwimSweet Sorry for the late response. I believe this PR could work. However, my biggest concert is that it could only work for native K8s application. AFAIK, the Yarn application mode and standalone mode should also benefit from this.

wangyang0918 avatar Oct 26 '22 03:10 wangyang0918

@wangyang0918 I will work for it to support Yarn applicaiton mode and standalone mode. I found that Yarn application mode already has similar features. This feature provides yarn.provided.lib.dirs and yarn.provided.usrlib.dir parameters. But it seems that this feature only supports Hadoop file system?

SwimSweet avatar Oct 26 '22 16:10 SwimSweet

@SwimSweet Yes. The user jar for Yarn application mode could only be a HDFS file. However, I believe it is enough since using Yarn distributed cache is more appropriate than downloading via http or Flink filesystem directly. This also means that we just need to add the support for standalone mode in this PR.

wangyang0918 avatar Oct 27 '22 02:10 wangyang0918

@flinkbot run azure

SwimSweet avatar Nov 17 '22 08:11 SwimSweet

@flinkbot run azure

SwimSweet avatar Nov 19 '22 09:11 SwimSweet

@wangyang0918 I have finished the work of StandAlone mode. Please take a look again.

SwimSweet avatar Nov 20 '22 01:11 SwimSweet

@wangyang0918 Please take a look..thx..

SwimSweet avatar Feb 12 '23 16:02 SwimSweet

Hi @SwimSweet, given your lack of response @ferenc-csaky offered to build on your work and take it forward based on my comments above. Hope that is OK, he is planning to post an updated PR next week.

mbalassi avatar Jan 05 '24 10:01 mbalassi

Closing this as the relevant work has been merged as part of #24065.

mbalassi avatar Jan 25 '24 08:01 mbalassi