seatunnel
seatunnel copied to clipboard
[Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test
SeaTunnel version: dev Hadoop version: Hadoop 2.10.2 Flink version: 1.12.7 Spark version: 2.4.3, scala version 2.11.12
Problems found to be fixed
- [x] https://github.com/apache/incubator-seatunnel/issues/2792
- [x] https://github.com/apache/incubator-seatunnel/issues/2794
- [x] https://github.com/apache/incubator-seatunnel/issues/2799
- [x] https://github.com/apache/incubator-seatunnel/issues/2803
- [x] https://github.com/apache/incubator-seatunnel/issues/2811
- [x] https://github.com/apache/incubator-seatunnel/issues/2812
- [x] https://github.com/apache/incubator-seatunnel/issues/2815
- [x] https://github.com/apache/incubator-seatunnel/issues/2473
- [x] https://github.com/apache/incubator-seatunnel/issues/2837
- [x] https://github.com/apache/incubator-seatunnel/issues/2868
- [x] https://github.com/apache/incubator-seatunnel/issues/2871
- [x] https://github.com/apache/incubator-seatunnel/issues/2894
Hive Source Connector
1. test text file format table
create table test_hive_source(
test_tinyint TINYINT,
test_smallint SMALLINT,
test_int INT,
test_bigint BIGINT,
test_boolean BOOLEAN,
test_float FLOAT,
test_double DOUBLE,
test_string STRING,
test_binary BINARY,
test_timestamp TIMESTAMP,
test_decimal DECIMAL(8,2),
test_char CHAR(64),
test_varchar VARCHAR(64),
test_date DATE,
test_array ARRAY<INT>,
test_map MAP<STRING, FLOAT>,
test_struct STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (test_par1 STRING, test_par2 STRING);
// insert 10 rows in the table partition(test_par1='par1', test_par2='par2')
insert into table test_hive_source partition(test_par1='par1', test_par2='par2') select
1 as test_tinyint,
1 as test_smallint,
100 as test_int,
40000000000 as test_bigint,
true as test_boolean,
1.01 as test_float,
1.002 as test_double,
'gan' as test_string,
'DataDataData' as test_binary,
current_timestamp() as test_timestamp,
83.2 as test_decimal,
'char64' as test_char,
'varchar64' as test_varchar,
cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as date) as test_date,
array(1,2) as test_array,
map("name",cast('1.11' as float),"age",cast('1.11' as float)) as test_map,
NAMED_STRUCT('street', 'London', 'city','W1a9JF','state','Finished','zip', 123) as test_struct;
// insert 10 rows in the table partition(test_par1='par1', test_par2='par2_1')
insert into table test_hive_source partition(test_par1='par1', test_par2='par2_1') select
test_tinyint,
test_smallint,
test_int,
test_bigint,
test_boolean,
test_float,
test_double,
test_string,
test_binary,
test_timestamp,
test_decimal,
test_char,
test_varchar,
test_date,
test_array,
test_map,
test_struct from test_hive_source;
// insert 20 rows in the table partition(test_par1='par1_1', test_par2='par2_2')
insert into table test_hive_source partition(test_par1='par1_1', test_par2='par2_2') select
test_tinyint,
test_smallint,
test_int,
test_bigint,
test_boolean,
test_float,
test_double,
test_string,
test_binary,
test_timestamp,
test_decimal,
test_char,
test_varchar,
test_date,
test_array,
test_map,
test_struct from test_hive_source;
Total rows:40
10 par1 par2
10 par1 par2_1
20 par1_1 par2_2
1.1 Test in Flink Engine
1.1.1 Job Config File
env {
# You can set flink configuration here
execution.parallelism = 3
job.name="test_hive_source_to_console"
}
source {
# This is a example input plugin **only for test and demonstrate the feature input plugin**
Hive {
table_name = "test_hive.test_hive_source"
metastore_uri = "thrift://ctyun7:9083"
}
# If you would like to get more information about how to configure seatunnel and see full list of input plugins,
# please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}
transform {
}
sink {
# choose stdout output plugin to output data to console
Console {
}
# If you would like to get more information about how to configure seatunnel and see full list of output plugins,
# please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}
1.1.2 Submit Job Command
sh start-seatunnel-flink-connector-v2.sh --config ../config/flink_hive_to_console.conf
1.1.3 Check Result in Flink Job log
The fields in data is lost, it can be fix in the feature #2473 The rows is right.

1.2 Test in Spark Engine
1.2.1 Job Config File
env {
# You can set flink configuration here
source.parallelism = 3
job.name="test_hive_source_to_console"
}
source {
# This is a example input plugin **only for test and demonstrate the feature input plugin**
Hive {
table_name = "test_hive.test_hive_source"
metastore_uri = "thrift://ctyun7:9083"
}
# If you would like to get more information about how to configure seatunnel and see full list of input plugins,
# please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}
transform {
}
sink {
# choose stdout output plugin to output data to console
Console {
}
# If you would like to get more information about how to configure seatunnel and see full list of output plugins,
# please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}
1.2.2 Submit Job Command --deploy-mode CLIENT --master local
sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hive_to_console.conf --deploy-mode client --master local
1.2.3 Check data result
The fields in data is lost, it can be fix in the feature. The rows is right.

1.2.4 Submit Job Command --deploy-mode client --master yarn
sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hive_to_console.conf --deploy-mode client --master yarn
1.2.5 Check data result

2. test orc file format table
create table test_hive_source_orc(
test_tinyint TINYINT,
test_smallint SMALLINT,
test_int INT,
test_bigint BIGINT,
test_boolean BOOLEAN,
test_float FLOAT,
test_double DOUBLE,
test_string STRING,
test_binary BINARY,
test_timestamp TIMESTAMP,
test_decimal DECIMAL(8,2),
test_char CHAR(64),
test_varchar VARCHAR(64),
test_date DATE,
test_array ARRAY<INT>,
test_map MAP<STRING, FLOAT>,
test_struct STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (test_par1 STRING, test_par2 STRING)
stored as orcfile;
The test data is same as text file format table.
2.1 Test in Flink engine
2.1.1 Job Config File
env {
# You can set flink configuration here
execution.parallelism = 3
job.name="test_hiveorc_source_to_console"
}
source {
# This is a example input plugin **only for test and demonstrate the feature input plugin**
Hive {
table_name = "test_hive.test_hive_source_orc"
metastore_uri = "thrift://ctyun7:9083"
}
# If you would like to get more information about how to configure seatunnel and see full list of input plugins,
# please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}
transform {
}
sink {
# choose stdout output plugin to output data to console
Console {
}
# If you would like to get more information about how to configure seatunnel and see full list of output plugins,
# please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}
2.1.2 Submit Job Command
sh start-seatunnel-flink-connector-v2.sh --config ../config/flink_hiveorc_to_console.conf
2.1.3 Check Data Result
The fields in data is right.
The rows is right.
2.2 Test in Spark Engine
2.2.1 Job Config File
env {
# You can set flink configuration here
source.parallelism = 3
job.name="test_hiveorc_source_to_console"
}
source {
# This is a example input plugin **only for test and demonstrate the feature input plugin**
Hive {
table_name = "test_hive.test_hive_source_orc"
metastore_uri = "thrift://ctyun7:9083"
}
# If you would like to get more information about how to configure seatunnel and see full list of input plugins,
# please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}
transform {
}
sink {
# choose stdout output plugin to output data to console
Console {
}
# If you would like to get more information about how to configure seatunnel and see full list of output plugins,
# please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}
2.2.2 Submit Job Command
sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hiveorc_to_console.conf --deploy-mode client --master local
2.2.3 Check Data Result
3. test parquet file format table
create table test_hive_source_parquet(
test_tinyint TINYINT,
test_smallint SMALLINT,
test_int INT,
test_bigint BIGINT,
test_boolean BOOLEAN,
test_float FLOAT,
test_double DOUBLE,
test_string STRING,
test_binary BINARY,
test_timestamp TIMESTAMP,
test_decimal DECIMAL(8,2),
test_char CHAR(64),
test_varchar VARCHAR(64),
test_date DATE,
test_array ARRAY<INT>,
test_map MAP<STRING, FLOAT>,
test_struct STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (test_par1 STRING, test_par2 STRING)
stored as PARQUET;
please assign it to me , I will try it ;
please assign it to me , I will try it ;
Which issue do you want to work?
I do the test now and I will record all the problems found in the test and add todo list here. You can receive the issue you want to handle.
please assign it to me , I will try it ;
Which issue do you want to work?
I do the test now and I will record all the problems found in the test and add todo list here. You can receive the issue you want to handle.
I want to work for #2792 ,but it seems more difficult for me ;
please assign it to me , I will try it ;
Which issue do you want to work? I do the test now and I will record all the problems found in the test and add todo list here. You can receive the issue you want to handle.
I want to work for #2792 ,but it seems more difficult for me ;
I fix #2792 already. You can look at other tasks. We have many issues marked as good first issue
and want help
.
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.