seatunnel
seatunnel copied to clipboard
[e2e] How about Generate a task like fake2Mysql to generate all type for source, and generate 100W row data to test performance
Search before asking
- [X] I had searched in the feature and found no similar feature requirement.
Description
Only dm generate all type for e2e, but only 1 row. because it's not easy generate diffcult data for any column. so I mean we can generate data by seatunnel task. We can add corresponding annotations according to juni5 extension.
Usage Scenario
No response
Related issues
No response
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
This is a import feature I want to discuss too. I think we can upgrate the fake connector and support generating a specified amount of data of a specified type, BTW, we can also control how many pieces of data there are in a field.
It's so great! But performance tests don't fit here, and almost no one runs performance tests all the time.
Also, how do we cover all data types from various data sources, is there any specific solution? Current test coverage is not comprehensive.
Also, how do we cover all data types from various data sources, is there any specific solution? Current test coverage is not comprehensive.
For JDBC, the DDL can be completed through a configuration file, and the data is generated through fake2source, and then verified using source2assert. This can be abstracted as junit5extension.
For others, we need a similar process, how can we use conf file to do DDL for others
maybe we can tale about all type test first. Performance test: we can put it on hold.
maybe we can tale about all type test first. Performance test: we can put it on hold.
I agree
@laglangyue
Thoughts on enhance FakeSource:
- Support define row type
- Supports define the total number of rows of data
- Supports direct definition of data values(row)
great,i can do somthing for this issue
With the addition of SeaTunnel Engine every connector need add e2e-test for three engine. This is a difficult and boring job for developers. So I think it's time to add an automated test framework for SeaTunnel. The test framework need support some key features:
- Automatically generate job configuration files based on the information provided by the connector. If the connector is a source connector, we need to determine whether a sink connector already exists. If already have a sink connector, we need generate two jobs. This first job is to generate test data, it looks like
FakeSource -> XxxSinkand another job isXxxSource -> AssertSink. On the other handle, If the connector is a sink connector, we need to determine whether a source connector already exists. If already have a source connector, we need generate two jobs. One of the job isFakeSource->XxxSinkand another job isXxxSource->AssertSink. - Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.
@hailin0 @CalvinKirs @getChan @531651225 @TyrantLucifer @2013650523 @chessplay Do you have any suggestions?
- Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.
@EricJoy2048 How about let's tackle this first? auto translation connector-e2e to multiple engines execute, this is easier to implement and has a higher priority
With the addition of
SeaTunnel Engineevery connector need add e2e-test for three engine. This is a difficult and boring job for developers. So I think it's time to add an automated test framework for SeaTunnel. The test framework need support some key features:
- Automatically generate job configuration files based on the information provided by the connector. If the connector is a source connector, we need to determine whether a sink connector already exists. If already have a sink connector, we need generate two jobs. This first job is to generate test data, it looks like
FakeSource -> XxxSinkand another job isXxxSource -> AssertSink. On the other handle, If the connector is a sink connector, we need to determine whether a source connector already exists. If already have a source connector, we need generate two jobs. One of the job isFakeSource->XxxSinkand another job isXxxSource->AssertSink.- Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.
@hailin0 @CalvinKirs @getChan @531651225 @TyrantLucifer @2013650523 @chessplay Do you have any suggestions?
This is a good suggestion, but it can be difficult to implement. There will be different configuration files and parameters for different connectors, and different connectors will require different docker images. I don't really understand if the program automatically generates the configuration file or if the whole process code has to be generated automatically.
- Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.
@EricJoy2048 How about let's tackle this first? auto translation connector-e2e to multiple engines execute, this is easier to implement and has a higher priority
agree, I has finished the e2e for spark,and I just copy the code for flink-e2e.
With the addition of
SeaTunnel Engineevery connector need add e2e-test for three engine. This is a difficult and boring job for developers. So I think it's time to add an automated test framework for SeaTunnel. The test framework need support some key features:
- Automatically generate job configuration files based on the information provided by the connector. If the connector is a source connector, we need to determine whether a sink connector already exists. If already have a sink connector, we need generate two jobs. This first job is to generate test data, it looks like
FakeSource -> XxxSinkand another job isXxxSource -> AssertSink. On the other handle, If the connector is a sink connector, we need to determine whether a source connector already exists. If already have a source connector, we need generate two jobs. One of the job isFakeSource->XxxSinkand another job isXxxSource->AssertSink.- Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.
@hailin0 @CalvinKirs @getChan @531651225 @TyrantLucifer @2013650523 @chessplay Do you have any suggestions?
This is a good suggestion, but it can be difficult to implement. There will be different configuration files and parameters for different connectors, and different connectors will require different docker images. I don't really understand if the program automatically generates the configuration file or if the whole process code has to be generated automatically.
The initialization of DataSources is the same for engin such as Spark,Flink,ST-engine.
With the addition of
SeaTunnel Engineevery connector need add e2e-test for three engine. This is a difficult and boring job for developers. So I think it's time to add an automated test framework for SeaTunnel. The test framework need support some key features:
- Automatically generate job configuration files based on the information provided by the connector. If the connector is a source connector, we need to determine whether a sink connector already exists. If already have a sink connector, we need generate two jobs. This first job is to generate test data, it looks like
FakeSource -> XxxSinkand another job isXxxSource -> AssertSink. On the other handle, If the connector is a sink connector, we need to determine whether a source connector already exists. If already have a source connector, we need generate two jobs. One of the job isFakeSource->XxxSinkand another job isXxxSource->AssertSink.- Test all engines automatically. The test framework need run the jobs in all of the engine SeaTunnel supported now.
@hailin0 @CalvinKirs @getChan @531651225 @TyrantLucifer @2013650523 @chessplay Do you have any suggestions?
This is a good suggestion, but it can be difficult to implement. There will be different configuration files and parameters for different connectors, and different connectors will require different docker images. I don't really understand if the program automatically generates the configuration file or if the whole process code has to be generated automatically.
The initialization of DataSources is the same for engin such as Spark,Flink,ST-engine.
Yep, I agree with you, but we still have to develop the code for each connector manually, it's not automatically generated. I can understand if it is to unify the process, but I can't understand the automatic generation.
link https://github.com/apache/incubator-seatunnel/issues/2733
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.