SeaTunnel version: dev Hadoop version: Hadoop 2.10.2 Flink version: 1.12.7 Spark version: 2.4.3, scala version 2.11.12

Problems found to be fixed

[x] https://github.com/apache/incubator-seatunnel/issues/2792
[x] https://github.com/apache/incubator-seatunnel/issues/2794
[x] https://github.com/apache/incubator-seatunnel/issues/2799
[x] https://github.com/apache/incubator-seatunnel/issues/2803
[x] https://github.com/apache/incubator-seatunnel/issues/2811
[x] https://github.com/apache/incubator-seatunnel/issues/2812
[x] https://github.com/apache/incubator-seatunnel/issues/2815
[x] https://github.com/apache/incubator-seatunnel/issues/2473
[x] https://github.com/apache/incubator-seatunnel/issues/2837
[x] https://github.com/apache/incubator-seatunnel/issues/2868
[x] https://github.com/apache/incubator-seatunnel/issues/2871
[x] https://github.com/apache/incubator-seatunnel/issues/2894

Hive Source Connector

1. test text file format table

create table test_hive_source(
     test_tinyint                          TINYINT,
     test_smallint                       SMALLINT,
     test_int                                INT,
     test_bigint                           BIGINT,
     test_boolean                       BOOLEAN,
     test_float                             FLOAT,
     test_double                         DOUBLE,
     test_string                           STRING,
     test_binary                          BINARY,
     test_timestamp                  TIMESTAMP,
     test_decimal                       DECIMAL(8,2),
     test_char                             CHAR(64),
     test_varchar                        VARCHAR(64),
     test_date                             DATE,
     test_array                            ARRAY<INT>,
     test_map                              MAP<STRING, FLOAT>,
     test_struct                           STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
     )
PARTITIONED BY (test_par1 STRING, test_par2 STRING);

// insert 10 rows in the table partition(test_par1='par1', test_par2='par2')
insert into table test_hive_source partition(test_par1='par1', test_par2='par2') select 
1 as test_tinyint,
1 as test_smallint,
100 as test_int,
40000000000 as test_bigint,
true as test_boolean,
1.01 as test_float,
1.002 as test_double,
'gan' as test_string,
'DataDataData' as test_binary,
current_timestamp() as test_timestamp,
83.2 as test_decimal,
'char64' as test_char,
'varchar64' as test_varchar,
cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as date) as test_date,
array(1,2) as test_array,
map("name",cast('1.11' as float),"age",cast('1.11' as float)) as test_map,
NAMED_STRUCT('street', 'London', 'city','W1a9JF','state','Finished','zip', 123) as test_struct;

// insert 10 rows in the table partition(test_par1='par1', test_par2='par2_1')
insert into table test_hive_source partition(test_par1='par1', test_par2='par2_1') select 
test_tinyint,
 test_smallint,
test_int,
test_bigint,
test_boolean,
test_float,
test_double,
test_string,
test_binary,
test_timestamp,
test_decimal,
test_char,
test_varchar,
test_date,
test_array,
test_map,
test_struct from test_hive_source;

// insert 20 rows in the table partition(test_par1='par1_1', test_par2='par2_2')
insert into table test_hive_source partition(test_par1='par1_1', test_par2='par2_2') select 
test_tinyint,
 test_smallint,
test_int,
test_bigint,
test_boolean,
test_float,
test_double,
test_string,
test_binary,
test_timestamp,
test_decimal,
test_char,
test_varchar,
test_date,
test_array,
test_map,
test_struct from test_hive_source;

Total rows:40

10      par1    par2
10      par1    par2_1
20      par1_1  par2_2

1.1 Test in Flink Engine

1.1.1 Job Config File

env {
  # You can set flink configuration here
  execution.parallelism = 3
  job.name="test_hive_source_to_console"
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**

  Hive {
    table_name = "test_hive.test_hive_source"
    metastore_uri = "thrift://ctyun7:9083"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}

transform {

}

sink {
  # choose stdout output plugin to output data to console
  Console {
  }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}

1.1.2 Submit Job Command

sh start-seatunnel-flink-connector-v2.sh --config ../config/flink_hive_to_console.conf

1.1.3 Check Result in Flink Job log

The fields in data is lost, it can be fix in the feature #2473 The rows is right.

1.2 Test in Spark Engine

1.2.1 Job Config File

env {
  # You can set flink configuration here
  source.parallelism = 3
  job.name="test_hive_source_to_console"
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**

  Hive {
    table_name = "test_hive.test_hive_source"
    metastore_uri = "thrift://ctyun7:9083"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}

transform {

}

sink {
  # choose stdout output plugin to output data to console
  Console {
  }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}

1.2.2 Submit Job Command --deploy-mode CLIENT --master local

sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hive_to_console.conf --deploy-mode client --master local

1.2.3 Check data result

The fields in data is lost, it can be fix in the feature. The rows is right.

1.2.4 Submit Job Command --deploy-mode client --master yarn

sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hive_to_console.conf --deploy-mode client --master yarn

1.2.5 Check data result

2. test orc file format table

create table test_hive_source_orc(
     test_tinyint                          TINYINT,
     test_smallint                       SMALLINT,
     test_int                                INT,
     test_bigint                           BIGINT,
     test_boolean                       BOOLEAN,
     test_float                             FLOAT,
     test_double                         DOUBLE,
     test_string                           STRING,
     test_binary                          BINARY,
     test_timestamp                  TIMESTAMP,
     test_decimal                       DECIMAL(8,2),
     test_char                             CHAR(64),
     test_varchar                        VARCHAR(64),
     test_date                             DATE,
     test_array                            ARRAY<INT>,
     test_map                              MAP<STRING, FLOAT>,
     test_struct                           STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
     )
PARTITIONED BY (test_par1 STRING, test_par2 STRING)
stored as orcfile;

The test data is same as text file format table.

2.1 Test in Flink engine

2.1.1 Job Config File

env {
  # You can set flink configuration here
  execution.parallelism = 3
  job.name="test_hiveorc_source_to_console"
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**

  Hive {
    table_name = "test_hive.test_hive_source_orc"
    metastore_uri = "thrift://ctyun7:9083"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}

transform {

}

sink {
  # choose stdout output plugin to output data to console
  Console {
  }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}

2.1.2 Submit Job Command

 sh start-seatunnel-flink-connector-v2.sh --config ../config/flink_hiveorc_to_console.conf

2.1.3 Check Data Result

The fields in data is right.

The rows is right.

2.2 Test in Spark Engine

2.2.1 Job Config File

env {
  # You can set flink configuration here
  source.parallelism = 3
  job.name="test_hiveorc_source_to_console"
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**

  Hive {
    table_name = "test_hive.test_hive_source_orc"
    metastore_uri = "thrift://ctyun7:9083"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}

transform {

}

sink {
  # choose stdout output plugin to output data to console
  Console {
  }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}

2.2.2 Submit Job Command

sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hiveorc_to_console.conf --deploy-mode client --master local

2.2.3 Check Data Result

3. test parquet file format table

create table test_hive_source_parquet(
     test_tinyint                          TINYINT,
     test_smallint                       SMALLINT,
     test_int                                INT,
     test_bigint                           BIGINT,
     test_boolean                       BOOLEAN,
     test_float                             FLOAT,
     test_double                         DOUBLE,
     test_string                           STRING,
     test_binary                          BINARY,
     test_timestamp                  TIMESTAMP,
     test_decimal                       DECIMAL(8,2),
     test_char                             CHAR(64),
     test_varchar                        VARCHAR(64),
     test_date                             DATE,
     test_array                            ARRAY<INT>,
     test_map                              MAP<STRING, FLOAT>,
     test_struct                           STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
     )
PARTITIONED BY (test_par1 STRING, test_par2 STRING)
stored as PARQUET;

Sep 19 '22 07:09 EricJoy2048

please assign it to me , I will try it ;

Sep 20 '22 01:09 john8628

please assign it to me , I will try it ;

Which issue do you want to work?

I do the test now and I will record all the problems found in the test and add todo list here. You can receive the issue you want to handle.

Sep 20 '22 01:09 EricJoy2048

please assign it to me , I will try it ;

Which issue do you want to work?

I do the test now and I will record all the problems found in the test and add todo list here. You can receive the issue you want to handle.

I want to work for #2792 ,but it seems more difficult for me ;

Sep 20 '22 05:09 john8628

please assign it to me , I will try it ;

Which issue do you want to work? I do the test now and I will record all the problems found in the test and add todo list here. You can receive the issue you want to handle.

I want to work for #2792 ,but it seems more difficult for me ;

I fix #2792 already. You can look at other tasks. We have many issues marked as good first issue and want help.

Sep 20 '22 06:09 EricJoy2048

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

Nov 19 '22 00:11 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

Nov 29 '22 00:11 github-actions[bot]

seatunnel
seatunnel copied to clipboard

[Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test

Problems found to be fixed

Hive Source Connector

1. test text file format table

1.1 Test in Flink Engine

1.1.1 Job Config File

1.1.2 Submit Job Command

1.1.3 Check Result in Flink Job log

1.2 Test in Spark Engine

1.2.1 Job Config File

1.2.2 Submit Job Command --deploy-mode CLIENT --master local

1.2.3 Check data result

1.2.4 Submit Job Command --deploy-mode client --master yarn

1.2.5 Check data result

2. test orc file format table

2.1 Test in Flink engine

2.1.1 Job Config File

2.1.2 Submit Job Command

2.1.3 Check Data Result

2.2 Test in Spark Engine

2.2.1 Job Config File

2.2.2 Submit Job Command

2.2.3 Check Data Result

3. test parquet file format table

seatunnel seatunnel copied to clipboard

[Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test

Problems found to be fixed

Hive Source Connector

1. test text file format table

1.1 Test in Flink Engine

1.1.1 Job Config File

1.1.2 Submit Job Command

1.1.3 Check Result in Flink Job log

1.2 Test in Spark Engine

1.2.1 Job Config File

1.2.2 Submit Job Command --deploy-mode CLIENT --master local

1.2.3 Check data result

1.2.4 Submit Job Command --deploy-mode client --master yarn

1.2.5 Check data result

2. test orc file format table

2.1 Test in Flink engine

2.1.1 Job Config File

2.1.2 Submit Job Command

2.1.3 Check Data Result

2.2 Test in Spark Engine

2.2.1 Job Config File

2.2.2 Submit Job Command

2.2.3 Check Data Result

3. test parquet file format table

seatunnel
seatunnel copied to clipboard