flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-28915] Support fetching artifacts in native K8s and standalone application mode

Open ferenc-csaky opened this issue 1 year ago • 2 comments

What is the purpose of the change

Support fetching remote job JAR and additional artifacts (UDFs, formats, dependencies, etc.) in nativa Kubernetes and standalone application mode. The current change contains fetchers for DFS (via Flink FS abstraction) and HTTP. Builds on prior work from #20779 and addresses to comments on said PR.

Brief change log

  • In standalone app mode, a --jars option is added, which has unlimited args, which will be fetched before the Flink cluster start. Example:
./bin/standalone-job.sh start-foreground \
  --jars http://localhost:9999/flink-sandbox.jar http://localhost:9999/test-udf.jar \
  --job-classname org.apache.flink.DummyJob
  • In native K8s app mode, the user can define additional artifacts via the user.artifacts.artifact-list property. Example:
./bin/flink run-application \
  --target kubernetes-application \
  -Dkubernetes.cluster-id=flink-cluster \
  -Dkubernetes.container.image.ref=flink \
  -Duser.artifacts.artifact-list=http://host.minikube.internal:9999/test-udf.jar \
  http://host.minikube.internal:9999/flink-sandbox.jar

Verifying this change

This change added tests and can be verified as follows:

  • Added tests for artifact fetching utils.
  • Added tests for artifact fetching logic.
  • Added tests to cover the changes in DefaultPackagedProgramRetriever.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: Artifact deployment in native Kubernetes app mode.
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs

ferenc-csaky avatar Jan 10 '24 18:01 ferenc-csaky

CI report:

  • 85e19c8c7f1ae1ba2384e33eb677b666d0b02ef8 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jan 10 '24 18:01 flinkbot

Thanks @ferenc-csaky, I like this approach to the feature and thanks for preserving the prior work of @SwimSweet in separate commits (I might squash his 2 into a single commit later). On cursory look could you please update the docs too? I will review further later.

Thank you for the review @mbalassi! Yes, I'll update the docs, furthermore I would like to highlight one thing regarding the current implementation that might be discussed a bit further:

Currently, for standalone app mode, the --jars arg handles both the job JAR and any additional artifact, which is a nice approach IMO, although it has some downsides:

  1. The argument number is unlimited (as a job can have N additional dependency), so if someone puts the --jars CLI option last and tries to pass positional args afterwards all of those will be handled as "jars", for example ... --jars myjob.jar posarg1 posarg2. This behavior could cause some confusion.
  2. Because the additional artifacts are not differentiated from the job JAR, the fetching logic gets a bit more complicated.

I started thinking about using the newly introduced option as a dynamic property for standalone mode as well. WDYT?

ferenc-csaky avatar Jan 11 '24 11:01 ferenc-csaky

@ferenc-csaky Thanks for the clarification. To your concern for accidentally passing additional arguments as jars, we have to be very careful as the user jars themselves often need arguments, this can have unintended consequences. I would like to ask you to instead expect the --jars argument as a comma separated list. This is exactly what Spark does.

mbalassi avatar Jan 16 '24 09:01 mbalassi

Nice, thank you for going over the comments. It looks good to me ✅

schevalley2 avatar Jan 18 '24 11:01 schevalley2

@flinkbot run azure

ferenc-csaky avatar Jan 18 '24 21:01 ferenc-csaky