incubator-streampark icon indicating copy to clipboard operation
incubator-streampark copied to clipboard

[Proposal] Unify flink configuration

Open wolfboys opened this issue 2 years ago • 3 comments

Search before asking

  • [X] I had searched in the feature and found no similar feature requirement.

Description

Currently, there is no unified specification for parameter settings in the streampark project, so this time we will solve the problem of the specification of the whole parameter setting It involves setting the env environment of the job (stream env | table env), user's parameters, and user's flinksql content

before:

flink:
  deployment:
    property:
       ${StreamExecutionEnvironment.key} : $value
  # table
  table:
    planner: blink # (blink|old|any)
    mode: streaming #(batch|streaming)

after:

env:
  option: #cli opiton args
    target: yarn-application # yarn-application, yarn-perjob
    shutdownOnAttachedExit:
    jobmanager:
    ...
  property: 
    ${StreamExecutionEnvironment.key} : $value
    ...
    table: 
      ${TableEnvironment.key} : $value
      ...
sql: # flinksql
   my_flinksql: |
    CREATE TABLE datagen (
      f_sequence INT,
      ts AS localtimestamp,
      WATERMARK FOR ts AS ts
    ) WITH (
      ....
    );
    ...

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

wolfboys avatar Oct 17 '22 06:10 wolfboys

Hi @wolfboys , thanks for your great proposal.

I have some questions:

  1. Why do you write the sql content in the config file? As I understand, the proposal want to unify the table config, right?
  2. For the table config, could we use env.table-property as the prefix? In other word, I don't think the env.property.table is a good idea. Because all configs under env.property will pass to Flink Env.

I use the table.exec.mini-batch.enabled as an example.

env:
  option: #cli opiton args
    target: yarn-application # yarn-application, yarn-perjob
    shutdownOnAttachedExit:
    jobmanager:
    ...
  property: 
    ${StreamExecutionEnvironment.key} : $value
    ...
  table-property: 
    table.exec.mini-batch.enabled : true

1996fanrui avatar Oct 18 '22 14:10 1996fanrui

hi @1996fanrui :

table config definition in env.property.table, env.property will pass to Flink Env(Exclude table prefix), e.g:

env:
  property: 
    ${key1} : ${value2}
    table: 
      ${key2} : ${value2}

${key1} is Flink Env config, key2 is Flink table config, not Flink Env config, All configurations with the env.property.table prefix are Flink table config

use the table.exec.mini-batch.enabled as an example:

env:
  option: #cli opiton args
    target: yarn-application # yarn-application, yarn-perjob
    shutdownOnAttachedExit:
    jobmanager:
  property: 
    taskmanager.numberOfTaskSlots: 1
    parallelism.default: 2
    table: 
     exec.mini-batch.enabled : true

wolfboys avatar Oct 18 '22 14:10 wolfboys

Hi @wolfboys ,

If the prefix of flink table config isn't table, what can we do? StreamPark should not be affected by flink parameter naming.

For example, the prefix of some table configs are sql-client.

https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/#sql-client-display-max-column-width

BTW, could you share these information to mail list? More developers can discuss with us.

1996fanrui avatar Oct 18 '22 15:10 1996fanrui