flink-kubernetes-operator icon indicating copy to clipboard operation
flink-kubernetes-operator copied to clipboard

Support config yaml

Open ctrlaltdilj opened this issue 1 year ago • 7 comments

Support Flink 1.19 config.yaml configuration

Flink 1.19 introduced new config.yaml format. There are some dependencies on this new format in 1.19 that prevent old flink-conf.yaml from being able to configure properties.

This PR, adds support to be able to set a config.yaml file which is able to be consumed by the operator as well as deployments.

Brief change log

  • added support to be able to set config.yaml by replacing flink-conf.yaml in the defaultConfiguration

Verifying this change

This change added tests and can be verified as follows:

  • Build image
  • create a values.yaml similar to
  • Extended integration test for recovery after master (JobManager) failure
  • Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes
  • The public API, i.e., is any changes to the CustomResourceDescriptors: no
  • Core observer or reconciler logic that is regularly executed: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not documented, will add

ctrlaltdilj avatar Jul 02 '24 12:07 ctrlaltdilj

I do think this would be better supported via a flag on configuration on FlinkDeployment, as it enables operator update independently from FlinkDeployments

I am happy to make that change and would prefer that. Any thoughts?

ctrlaltdilj avatar Jul 02 '24 12:07 ctrlaltdilj

Hi, first of all, thank you for the contribution, but for better observability please try to always create a JIRA before opening a PR. Unless trying to push a fix for a release candidate, please use the main branch for your PRs. Flink version is already 1.19.1 on the main branch.

mateczagany avatar Jul 02 '24 13:07 mateczagany

Thanks @mateczagany for all the info! I have requested a JIRA account, and will file a ticket

ctrlaltdilj avatar Jul 02 '24 13:07 ctrlaltdilj

@mateczagany Created following ticket: https://issues.apache.org/jira/browse/FLINK-35744

ctrlaltdilj avatar Jul 02 '24 14:07 ctrlaltdilj

Hey @ctrlaltdilj ! Are you still working on this PR?

gyfora avatar Oct 03 '24 09:10 gyfora

@gyfora Sorry for the late response, Yes I can make the changes you mentioned, should be able to make it automatic based on the flink version being requested during deployments

Should have time to update this draft sometime later today

ctrlaltdilj avatar Oct 03 '24 13:10 ctrlaltdilj

Hi , i made changes to helm template and in values set

defaultConfiguration:
  standardYaml: true

The operator dont start pod with errror

Exception in thread "main" org.apache.flink.configuration.IllegalConfigurationException: The Flink config file '/opt/flink/conf/flink-conf.yaml' (/opt/flink/conf/flink-conf.yaml) does not exist.

myskaludek avatar Oct 25 '24 11:10 myskaludek

Hi @ctrlaltdilj any updates on this? Is there a chance of this moving forward? It would be a great addition!

hugovideira-ppb avatar Nov 13 '24 23:11 hugovideira-ppb

Hi @hugovideira-ppb , yes, apologize got busy, I'll try to knock this out

ctrlaltdilj avatar Nov 18 '24 13:11 ctrlaltdilj

@hugovideira-ppb , I have updated the helm chart, if this approach looks good, can I proceed with some documentation changes and adding some tests?

in the values.yaml you can replace, to test locally:

flink-conf.yaml: |+

to

config.yaml: |+

ctrlaltdilj avatar Mar 08 '25 15:03 ctrlaltdilj

@gyfora does this approach look better?

ctrlaltdilj avatar Mar 10 '25 03:03 ctrlaltdilj

@gyfora @ljankows whenever you get a chance to take a look, if the approach makes sense, I can proceed with some documentation changes and adding some tests?

ctrlaltdilj avatar Mar 14 '25 19:03 ctrlaltdilj

I think this will require some more testing and some e2es to make sure that this type of configuration on the operator side still works with Flink versions.

Can we extend the e2e logic to cover this somehow?

gyfora avatar Mar 19 '25 14:03 gyfora

@gyfora that should be possible, let me take a look

ctrlaltdilj avatar Mar 19 '25 16:03 ctrlaltdilj

@gyfora added, let me know what you think, or if you have any feedback

ctrlaltdilj avatar Apr 06 '25 20:04 ctrlaltdilj