seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[e2e] How about Generate a task like fake2Mysql to generate all type for source, and generate 100W row data to test performance

Open laglangyue opened this issue 3 years ago • 15 comments

Search before asking

  • [X] I had searched in the feature and found no similar feature requirement.

Description

Only dm generate all type for e2e, but only 1 row. because it's not easy generate diffcult data for any column. so I mean we can generate data by seatunnel task. We can add corresponding annotations according to juni5 extension.

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

laglangyue avatar Sep 08 '22 14:09 laglangyue

This is a import feature I want to discuss too. I think we can upgrate the fake connector and support generating a specified amount of data of a specified type, BTW, we can also control how many pieces of data there are in a field.

TyrantLucifer avatar Sep 09 '22 04:09 TyrantLucifer

It's so great! But performance tests don't fit here, and almost no one runs performance tests all the time.

CalvinKirs avatar Sep 09 '22 07:09 CalvinKirs

Also, how do we cover all data types from various data sources, is there any specific solution? Current test coverage is not comprehensive.

CalvinKirs avatar Sep 09 '22 07:09 CalvinKirs

Also, how do we cover all data types from various data sources, is there any specific solution? Current test coverage is not comprehensive.

For JDBC, the DDL can be completed through a configuration file, and the data is generated through fake2source, and then verified using source2assert. This can be abstracted as junit5extension.

For others, we need a similar process, how can we use conf file to do DDL for others

laglangyue avatar Sep 09 '22 12:09 laglangyue

maybe we can tale about all type test first. Performance test: we can put it on hold.

laglangyue avatar Sep 09 '22 12:09 laglangyue

maybe we can tale about all type test first. Performance test: we can put it on hold.

I agree

hailin0 avatar Sep 13 '22 05:09 hailin0

@laglangyue

Thoughts on enhance FakeSource:

  • Support define row type
  • Supports define the total number of rows of data
  • Supports direct definition of data values(row)

hailin0 avatar Sep 13 '22 05:09 hailin0

great,i can do somthing for this issue

liugddx avatar Sep 13 '22 11:09 liugddx

With the addition of SeaTunnel Engine every connector need add e2e-test for three engine. This is a difficult and boring job for developers. So I think it's time to add an automated test framework for SeaTunnel. The test framework need support some key features:

  1. Automatically generate job configuration files based on the information provided by the connector. If the connector is a source connector, we need to determine whether a sink connector already exists. If already have a sink connector, we need generate two jobs. This first job is to generate test data, it looks like FakeSource -> XxxSink and another job is XxxSource -> AssertSink. On the other handle, If the connector is a sink connector, we need to determine whether a source connector already exists. If already have a source connector, we need generate two jobs. One of the job is FakeSource->XxxSink and another job is XxxSource->AssertSink.
  2. Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.

@hailin0 @CalvinKirs @getChan @531651225 @TyrantLucifer @2013650523 @chessplay Do you have any suggestions?

EricJoy2048 avatar Sep 13 '22 13:09 EricJoy2048

  1. Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.

@EricJoy2048 How about let's tackle this first? auto translation connector-e2e to multiple engines execute, this is easier to implement and has a higher priority

hailin0 avatar Sep 14 '22 13:09 hailin0

With the addition of SeaTunnel Engine every connector need add e2e-test for three engine. This is a difficult and boring job for developers. So I think it's time to add an automated test framework for SeaTunnel. The test framework need support some key features:

  1. Automatically generate job configuration files based on the information provided by the connector. If the connector is a source connector, we need to determine whether a sink connector already exists. If already have a sink connector, we need generate two jobs. This first job is to generate test data, it looks like FakeSource -> XxxSink and another job is XxxSource -> AssertSink. On the other handle, If the connector is a sink connector, we need to determine whether a source connector already exists. If already have a source connector, we need generate two jobs. One of the job is FakeSource->XxxSink and another job is XxxSource->AssertSink.
  2. Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.

@hailin0 @CalvinKirs @getChan @531651225 @TyrantLucifer @2013650523 @chessplay Do you have any suggestions?

This is a good suggestion, but it can be difficult to implement. There will be different configuration files and parameters for different connectors, and different connectors will require different docker images. I don't really understand if the program automatically generates the configuration file or if the whole process code has to be generated automatically.

TyrantLucifer avatar Sep 14 '22 13:09 TyrantLucifer

  1. Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.

@EricJoy2048 How about let's tackle this first? auto translation connector-e2e to multiple engines execute, this is easier to implement and has a higher priority

agree, I has finished the e2e for spark,and I just copy the code for flink-e2e.

laglangyue avatar Sep 14 '22 13:09 laglangyue

With the addition of SeaTunnel Engine every connector need add e2e-test for three engine. This is a difficult and boring job for developers. So I think it's time to add an automated test framework for SeaTunnel. The test framework need support some key features:

  1. Automatically generate job configuration files based on the information provided by the connector. If the connector is a source connector, we need to determine whether a sink connector already exists. If already have a sink connector, we need generate two jobs. This first job is to generate test data, it looks like FakeSource -> XxxSink and another job is XxxSource -> AssertSink. On the other handle, If the connector is a sink connector, we need to determine whether a source connector already exists. If already have a source connector, we need generate two jobs. One of the job is FakeSource->XxxSink and another job is XxxSource->AssertSink.
  2. Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.

@hailin0 @CalvinKirs @getChan @531651225 @TyrantLucifer @2013650523 @chessplay Do you have any suggestions?

This is a good suggestion, but it can be difficult to implement. There will be different configuration files and parameters for different connectors, and different connectors will require different docker images. I don't really understand if the program automatically generates the configuration file or if the whole process code has to be generated automatically.

The initialization of DataSources is the same for engin such as Spark,Flink,ST-engine.

laglangyue avatar Sep 14 '22 13:09 laglangyue

With the addition of SeaTunnel Engine every connector need add e2e-test for three engine. This is a difficult and boring job for developers. So I think it's time to add an automated test framework for SeaTunnel. The test framework need support some key features:

  1. Automatically generate job configuration files based on the information provided by the connector. If the connector is a source connector, we need to determine whether a sink connector already exists. If already have a sink connector, we need generate two jobs. This first job is to generate test data, it looks like FakeSource -> XxxSink and another job is XxxSource -> AssertSink. On the other handle, If the connector is a sink connector, we need to determine whether a source connector already exists. If already have a source connector, we need generate two jobs. One of the job is FakeSource->XxxSink and another job is XxxSource->AssertSink.
  2. Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.

@hailin0 @CalvinKirs @getChan @531651225 @TyrantLucifer @2013650523 @chessplay Do you have any suggestions?

This is a good suggestion, but it can be difficult to implement. There will be different configuration files and parameters for different connectors, and different connectors will require different docker images. I don't really understand if the program automatically generates the configuration file or if the whole process code has to be generated automatically.

The initialization of DataSources is the same for engin such as Spark,Flink,ST-engine.

Yep, I agree with you, but we still have to develop the code for each connector manually, it's not automatically generated. I can understand if it is to unify the process, but I can't understand the automatic generation.

TyrantLucifer avatar Sep 14 '22 13:09 TyrantLucifer

link https://github.com/apache/incubator-seatunnel/issues/2733

EricJoy2048 avatar Sep 15 '22 08:09 EricJoy2048

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Oct 16 '22 00:10 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Oct 24 '22 00:10 github-actions[bot]