seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Umbrella] SeaTunnel Transform V2 Design

Open hailin0 opened this issue 3 years ago • 6 comments

Code of Conduct

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Describe the proposal

Backgroud #2678

Currently, the transform code is bound to the single engine and cannot be shared to other engine using.

I propose that we create transform-v2 module to unify transform implement, like source and sink, it is decoupled from the engine and can run on different engines.

Furthermore, we can use the translation module to integrate transform to seatunnel, flink, spark engine execute.

In order to ensure seatunnel's positioning as a data integration platform and not introduce work beyond the plan, the transform-v2 will only support UDF level data conversion, and And unsupported sql transform(because st-engine unsupported sql parse & analysis).

Motivation

  • Supports running on different engines
  • Supports update fields datatype & value & orders
  • Supports delete\add fields

Overall Design

The Transform base process contains:

  • Transform implement
  • Transform translation layer
    • Adapt to flink engine
    • Adapt to spark engine
    • Adapt to seatunnel engine

Transform image image

Translation layer image

Task list

Translation layer

  • [x] #3145
  • [ ] #3267
  • [ ] #3268

Transform

  • [ ] Substring transform
  • [ ] Convert date & time & timestamp transform

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

hailin0 avatar Oct 25 '22 05:10 hailin0

transform method use function ? can support sql ?

yuangjiang avatar Oct 25 '22 09:10 yuangjiang

transform method use function ? can support sql ?

@yuangjiang Transform directly operates stream<row> on engines, currently unsupported using sql, but can achieve the same features

hailin0 avatar Oct 25 '22 10:10 hailin0

I suggest support send dirty data to the extra Sink.

iture123 avatar Oct 25 '22 10:10 iture123

I suggest support send dirty data to the extra Sink.

Good idea. This is another features -- data partition (selected data row will be send to specified sink)

hailin0 avatar Oct 26 '22 14:10 hailin0

@hailin0 Can we describe the releationship betweens transform like transform1 & transform2 and parallel, and transform3 use both transform1 & transform2 to do the filter.

hk-lrzy avatar Oct 28 '22 08:10 hk-lrzy

@hailin0 Can we describe the releationship betweens transform like transform1 & transform2 and parallel, and transform3 use both transform1 & transform2 to do the filter.

reference https://seatunnel.apache.org/docs/concept/config#other

hailin0 avatar Nov 02 '22 03:11 hailin0

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Dec 03 '22 00:12 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Dec 13 '22 00:12 github-actions[bot]