flink [FLINK-28915] Flink Native k8s mode jar localtion support s3 schema.

What is the purpose of the change

Kerbernetes Native K8s Application Mode and StandAlone Application Mode support fetching jar from DFS schema(S3, OSS, HDFS, etc.).

Brief change log

Fetch jar from DFS(S3, OSS, HDFS, etc.) before starting flink cluster.

Verifying this change

This change added tests and can be verified as follows:

Remove testDeployApplicationClusterWithNonLocalSchema test
Added tests that fetch jar from http schema
Added tests that fetch jar from file schema
Added test that create emptyDir for saving user artifacts
Manually verify the local, file, oss, HDFS with Kerberos, S3 resource.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: Native Kubernetes Application Mode: yes
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yes
If yes, how is the feature documented? docs

Sep 07 '22 17:09 SwimSweet

CI report:

c0efcfe43c465daf45fff83154d7469fbd1e4f93 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

Sep 07 '22 17:09 flinkbot

I create a new pr based on master. Please help take a look when you are free @wangyang0918 @Aitozi . thx.

Sep 08 '22 06:09 SwimSweet

@SwimSweet Sorry for the late response. I believe this PR could work. However, my biggest concert is that it could only work for native K8s application. AFAIK, the Yarn application mode and standalone mode should also benefit from this.

Oct 26 '22 03:10 wangyang0918

@wangyang0918 I will work for it to support Yarn applicaiton mode and standalone mode. I found that Yarn application mode already has similar features. This feature provides yarn.provided.lib.dirs and yarn.provided.usrlib.dir parameters. But it seems that this feature only supports Hadoop file system?

Oct 26 '22 16:10 SwimSweet

@SwimSweet Yes. The user jar for Yarn application mode could only be a HDFS file. However, I believe it is enough since using Yarn distributed cache is more appropriate than downloading via http or Flink filesystem directly. This also means that we just need to add the support for standalone mode in this PR.

Oct 27 '22 02:10 wangyang0918

@flinkbot run azure

Nov 17 '22 08:11 SwimSweet

@flinkbot run azure

Nov 19 '22 09:11 SwimSweet

@wangyang0918 I have finished the work of StandAlone mode. Please take a look again.

Nov 20 '22 01:11 SwimSweet

@wangyang0918 Please take a look..thx..

Feb 12 '23 16:02 SwimSweet

Hi @SwimSweet, given your lack of response @ferenc-csaky offered to build on your work and take it forward based on my comments above. Hope that is OK, he is planning to post an updated PR next week.

Jan 05 '24 10:01 mbalassi

Closing this as the relevant work has been merged as part of #24065.

Jan 25 '24 08:01 mbalassi