flink
flink copied to clipboard
[FLINK-28915] Support fetching artifacts in native K8s and standalone application mode
What is the purpose of the change
Support fetching remote job JAR and additional artifacts (UDFs, formats, dependencies, etc.) in nativa Kubernetes and standalone application mode. The current change contains fetchers for DFS (via Flink FS abstraction) and HTTP. Builds on prior work from #20779 and addresses to comments on said PR.
Brief change log
- In standalone app mode, a
--jars
option is added, which has unlimited args, which will be fetched before the Flink cluster start. Example:
./bin/standalone-job.sh start-foreground \
--jars http://localhost:9999/flink-sandbox.jar http://localhost:9999/test-udf.jar \
--job-classname org.apache.flink.DummyJob
- In native K8s app mode, the user can define additional artifacts via the
user.artifacts.artifact-list
property. Example:
./bin/flink run-application \
--target kubernetes-application \
-Dkubernetes.cluster-id=flink-cluster \
-Dkubernetes.container.image.ref=flink \
-Duser.artifacts.artifact-list=http://host.minikube.internal:9999/test-udf.jar \
http://host.minikube.internal:9999/flink-sandbox.jar
Verifying this change
This change added tests and can be verified as follows:
- Added tests for artifact fetching utils.
- Added tests for artifact fetching logic.
- Added tests to cover the changes in
DefaultPackagedProgramRetriever
.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
@Public(Evolving)
: no - The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: Artifact deployment in native Kubernetes app mode.
- The S3 file system connector: no
Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? docs
CI report:
- 85e19c8c7f1ae1ba2384e33eb677b666d0b02ef8 Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:-
@flinkbot run azure
re-run the last Azure build
Thanks @ferenc-csaky, I like this approach to the feature and thanks for preserving the prior work of @SwimSweet in separate commits (I might squash his 2 into a single commit later). On cursory look could you please update the docs too? I will review further later.
Thank you for the review @mbalassi! Yes, I'll update the docs, furthermore I would like to highlight one thing regarding the current implementation that might be discussed a bit further:
Currently, for standalone app mode, the --jars
arg handles both the job JAR and any additional artifact, which is a nice approach IMO, although it has some downsides:
- The argument number is unlimited (as a job can have N additional dependency), so if someone puts the
--jars
CLI option last and tries to pass positional args afterwards all of those will be handled as "jars", for example... --jars myjob.jar posarg1 posarg2
. This behavior could cause some confusion. - Because the additional artifacts are not differentiated from the job JAR, the fetching logic gets a bit more complicated.
I started thinking about using the newly introduced option as a dynamic property for standalone mode as well. WDYT?
@ferenc-csaky Thanks for the clarification. To your concern for accidentally passing additional arguments as jars, we have to be very careful as the user jars themselves often need arguments, this can have unintended consequences. I would like to ask you to instead expect the --jars
argument as a comma separated list. This is exactly what Spark does.
Nice, thank you for going over the comments. It looks good to me ✅
@flinkbot run azure