seatunnel [ST-Engine][LogicalPlan] Generate LogicalPlan from JobConfig file

Search before asking

[X] I had searched in the feature and found no similar feature requirement.

Description

We define job pipeline by job config file, So the first thing the SeaTunnelClient need to do parse the job config file and generate the action list. action is similar to the operator in Flink, It is an encapsulation of the SeaTunnel API. An action contains an instance of SeaTunnelSource or SeaTunnelTransform or SeaTunnelSink. Every action need to know the upstream of it self.

public interface Action extends Serializable {
    @NonNull
    String name();

    void setName(@NonNull String name);

    @NonNull
    List<Action> upstream();

    void addUpstream(@NonNull Action action);
}

We support SourceAction 、SinkAction and TransformAction yet.

We have two profile to parse job config file.

If there is only one Source and one Sink and at most one Transform, We simply build actions pipeline in the following order.

If there are multiple sources or multiple transforms or multiple sink, We will rely on source_table_name and result_table_name to build actions pipeline. So in this case result_table_name is necessary for the Source Connector and all of result_table_name and source_table_name are necessary for Transform Connector. By the end, source_table_name is necessary for Sink Connector.

Parallelism

In SeaTunnel Engine, only Source and PartitionTransform Connector support set parallelism. The action which can not set parallelism will extends the upstream action parallelism. If an action have more than one upstream action, The parallelism of this action is the sum of the parallelism of its upstream action. Because if an action's parallelism equals to it's upstream's parallelism, the action can be chain into it's upstream action and run in a same subtask.

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

[X] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Jul 25 '22 05:07 EricJoy2048

hi ,please Why does the parallelism of the parallel transformation change from 4 to 2?

Jul 27 '22 13:07 2013650523

hi ,please Why does the parallelism of the parallel transformation change from 4 to 2?

In our design, Only Source and PartitionTransform Connector can set parallelism. The PartitionTransform Connector is used to repartition data and parallelism is the target partition number.

Jul 28 '22 01:07 EricJoy2048

I can't understand what PartitionTransform does in the diagram. If there is a mechanism for Transform4 to distribute data to PartitionTransform, then why don't we distribute to Sink directly? Wouldn't it be simpler to let the user specify the parallelism of the Sink in this case?

Jul 28 '22 08:07 ashulin

seatunnel seatunnel copied to clipboard

[ST-Engine][LogicalPlan] Generate LogicalPlan from JobConfig file

Search before asking

Description

We have two profile to parse job config file.

Parallelism

Usage Scenario

Related issues

Are you willing to submit a PR?

Code of Conduct

seatunnel
seatunnel copied to clipboard