hive
hive copied to clipboard
HIVE-27725: Remove redundant columns in TAB_COL_STATS and PART_COL_STATS tables
What changes were proposed in this pull request?
- Remove CAT_NAME, DB_NAME and TABLE_NAME from
TAB_COL_STATS
table. - Remove CAT_NAME, DB_NAME, TABLE_NAME and PARTITION_NAME columns from
PART_COL_STATS
table.
Why are the changes needed?
- Refine the metadata schema, remove redundant columns of TAB_COL_STATS and PART_COL_STATS
- Unify get table stats, update table stats and delete table stats, make sure all query use the join clause to avoid data inconsistency.
Does this PR introduce any user-facing change?
No.
Is the change a dependency upgrade?
No.
How was this patch tested?
Add a new unit test.
@saihemanth-cloudera @deniskuzZ @dengzhhu653: Could you help review this PR?
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
22 Code Smells
No Coverage information
No Duplication information
The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
22 Code Smells
No Coverage information
No Duplication information
The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here
Need to rerun the CI to get a green pass.
Currently DirectSqlUpdateStat is used only in ETL flow, can we use directSql stats update in HiveAlterHandler(rename table/partition/col, drop col) as well ?
Please check whether all this columns can be removed from hive_metastore.thrift ColumnStatisticsDesc
Currently DirectSqlUpdateStat is used only in ETL flow, can we use directSql stats update in HiveAlterHandler(rename table/partition/col, drop col) as well ?
Yes, here we can make some improvement, for example renaming table/partition does not need update column statistic.
Please check whether all this columns can be removed from hive_metastore.thrift ColumnStatisticsDesc
We can not remove these columns in ColumnStatisticsDecs
, because:
- it's the API data struct, changing the struct would cause API incompatible.
- these columns are meaningful to describe the table or partition metadata.
Applying the change Hive may suffer from intermittent outage when the CBO is on(HiveServer2 connects to the old HMS but with the schema change). How about leaving the redundant columns on database but HMS doesn't use these columns at all(using the multi-join proposed in this PR)?
@dengzhhu653 Not sure if i understand correctly.
Applying the change Hive may suffer from intermittent outage when the CBO is on(HiveServer2 connects to the old HMS but with the schema change).
Do you mean that upgrade-4.0.0-beta-1-to-4.0.0-beta-2.mysql.sql schema is upgraded but the hms is not? I think yes. HMS will encounter error when fetching the table&partition stats as the direct sql generated by HMS can query the underlying metadata in mysql. But i think it should not be a problem, users should know that if they upgraded the metadata schema, they also should upgrade the HMS binary package at the same.
How about leaving the redundant columns on database but HMS doesn't use these columns at all(using the multi-join proposed in this PR)?
IMO, It's not a good idea that leaving the redundant columns in order for older HMS stats working. Now that you've upgrade the metadata schema, why don't you upgrade the HMS binary package? I can't think of a reason for this.
Also, it seems that we never promised that the old hms can work well with the newer&upgraded metadata schema. Maybe any change on metadata schema can make old HMS occur error, not just this PR.
@zhangbutao
But i think it should not be a problem, users should know that if they upgraded the metadata schema, they also should upgrade the HMS binary package at the same.
In my opinion, the upgrade for schema and the binary can't happen at the same time, especially if there are multiple different HMS instances and clusters. For this change, we drop columns from the old table, for safety we need to back up the table first before doing any operation, and avoid the old HMS writing into this table during the upgrade, you cannot tell the user not roll back to the old version if they find something wrong.
Maybe any change on metadata schema can make old HMS occur error, not just this PR.
that's possible, so we need to reduce the possibility of such thing recurring, especially for the user to recover from failure
Hi @dengzhhu653 , I agree with @zhangbutao that we should also clear the schema in this patch. If we want a smooth upgrade for this feature, we need do as follows: Stage1: new_version1 read stat by join id and still write all redundant names (just as you suggest), old_version read by names, Stage2: new_version2 write without redundant names, both new_version2 and new_version1 read by join id Stage3: remove the redundant columns in schema (new_version3 for schema update)
We will need three patches (versions) to achieve this feature, which seems a bit tedious. If we combine all changes in this patch, and upgrade HMS clusters one by one, and finally change the schema. Then the only issue was that the old HMS instances could not read the new stats created by new HMS instances, I think it's acceptable. Furthermore, we can offline all old instances in the process of upgrade to avoid this issue.
Can you fix the failed tests? http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4744/18/tests
@zhangbutao CI is green.
There are some discussions about the performance regression after deprecating the redundant columns: https://issues.apache.org/jira/browse/HIVE-11786
performance regression is what we need concern about. @wecharyu has did some benchmark to validate the performance is ok after this change. But the benchmark is on the local HMS benchmark tools, maybe this can not show the actual production use case?
It's better to do a more extreme benchmart test, as HIVE-11786 said, 2million+ in PART_COL_STATS
.
@wecharyu Do you have any actual production use case(big partitions with PART_COL_STATS) to show no performance regression for this change?
@dengzhhu653 @zhangbutao I made another benchmark test with 2 millions partitions stats written in advance, it shows no obvious performance regression:
mysql> select count(1) from PART_COL_STATS;
+----------+
| count(1) |
+----------+
| 2000000 |
+----------+
1 row in set (0.40 sec)
benchmark test:
java -jar ./hmsbench-jar-with-dependencies.jar -H localhost --savedata /tmp/benchdata --sanitize -N 100 -N 1000 -o bench_results_direct.csv -C -d testbench_http --params=100 -E 'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E 'listPartitions.*' -E 'getPartitionsByNames.*' -E 'getPartitionNames.*' -E 'listPartition' -E 'getPartition' -E 'getPartitions' -E 'getPartitions.10' -E 'getPartitions.100' -E 'getPartitions.1000' -E 'addPartition.*' -E 'addPartitions.*' -E 'alterPartitions.*' -E 'getNid' -E 'listDatabases' -E 'getTable' -E 'createTable' -E 'openTxn.*'
- before this patch
Operation Mean Med Min Max Err%
getPartitionsStat 5.21167 5.16801 4.92140 6.05965 3.77022
getPartitionsStat.100 6.93186 6.83728 6.48675 10.2091 6.80759
getPartitionsStat.1000 15.1901 14.8172 14.3164 19.6772 6.61940
updatePartitionsStat 9.83066 9.63766 9.27253 16.3278 9.28177
updatePartitionsStat.100 1009.46 1009.26 991.282 1052.16 0.956140
updatePartitionsStat.1000 10091.7 10088.1 9929.50 10309.3 0.760790
- after this patch
Operation Mean Med Min Max Err%
getPartitionsStat 5.56409 5.49373 5.20583 7.02619 5.03727
getPartitionsStat.100 6.34526 6.29966 5.97725 7.85943 4.11913
getPartitionsStat.1000 14.2403 14.1247 13.6040 15.8745 3.02256
updatePartitionsStat 10.5586 10.3743 9.88599 14.8948 7.01613
updatePartitionsStat.100 1013.06 1011.71 978.329 1047.57 1.45127
updatePartitionsStat.1000 9912.52 9905.62 9677.24 10163.9 1.22903
@dengzhhu653 @zhangbutao I made another benchmark test with 2 millions partitions stats written in advance, it shows no obvious performance regression:
mysql> select count(1) from PART_COL_STATS; +----------+ | count(1) | +----------+ | 2000000 | +----------+ 1 row in set (0.40 sec)
benchmark test:
java -jar ./hmsbench-jar-with-dependencies.jar -H localhost --savedata /tmp/benchdata --sanitize -N 100 -N 1000 -o bench_results_direct.csv -C -d testbench_http --params=100 -E 'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E 'listPartitions.*' -E 'getPartitionsByNames.*' -E 'getPartitionNames.*' -E 'listPartition' -E 'getPartition' -E 'getPartitions' -E 'getPartitions.10' -E 'getPartitions.100' -E 'getPartitions.1000' -E 'addPartition.*' -E 'addPartitions.*' -E 'alterPartitions.*' -E 'getNid' -E 'listDatabases' -E 'getTable' -E 'createTable' -E 'openTxn.*'
- before this patch
Operation Mean Med Min Max Err% getPartitionsStat 5.21167 5.16801 4.92140 6.05965 3.77022 getPartitionsStat.100 6.93186 6.83728 6.48675 10.2091 6.80759 getPartitionsStat.1000 15.1901 14.8172 14.3164 19.6772 6.61940 updatePartitionsStat 9.83066 9.63766 9.27253 16.3278 9.28177 updatePartitionsStat.100 1009.46 1009.26 991.282 1052.16 0.956140 updatePartitionsStat.1000 10091.7 10088.1 9929.50 10309.3 0.760790
- after this patch
Operation Mean Med Min Max Err% getPartitionsStat 5.56409 5.49373 5.20583 7.02619 5.03727 getPartitionsStat.100 6.34526 6.29966 5.97725 7.85943 4.11913 getPartitionsStat.1000 14.2403 14.1247 13.6040 15.8745 3.02256 updatePartitionsStat 10.5586 10.3743 9.88599 14.8948 7.01613 updatePartitionsStat.100 1013.06 1011.71 978.329 1047.57 1.45127 updatePartitionsStat.1000 9912.52 9905.62 9677.24 10163.9 1.22903
From what I see in benchmarkGetPartitionsStat
, looks there is only one table, thousands of partitions and col stats, am I missing something?
I guess the performance regression is caused by multiple join after removing the columns, how many databases, tables, partitions in the bench test?
Thanks, Zhihua
From what I see in benchmarkGetPartitionsStat, looks there is only one table, thousands of partitions and col stats, am I missing something?
2 millions partition column stats are added in the mysql before benchmark test. benchmarkGetPartitionsStat will still only query the count as input by -N options.
I guess the performance regression is caused by multiple join after removing the columns, how many databases, tables, partitions in the bench test?
The 2 millions partition column stats are: 10 dbs * 10 tbls per db * 200 partitions per tbl * 100 column stats per partition
BTW, all joins across DBS, TBLS, and PARTITIONS tables would hit indexes, I don't think such join would cause performance regression:
mysql> explain select * from PART_COL_STATS join PARTITIONS on PARTITIONS.PART_ID=PART_COL_STATS.PART_ID join TBLS on TBLS.TBL_ID=PARTITIONS.TBL_ID join DBS on DBS.DB_ID=TBLS.DB_ID where DBS.NAME='db0' and DBS.CTLG_NAME='hive' and
TBLS.TBL_NAME='tbl0' and PARTITIONS.PART_NAME='p=part_val0' and PART_COL_STATS.engine='hive' and PART_COL_STATS.COLUMN_NAME='id0';
+----+-------------+----------------+------------+-------+----------------------------------+-----------------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------+------------+-------+----------------------------------+-----------------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | DBS | NULL | const | PRIMARY,UNIQUE_DATABASE,CTLG_FK1 | UNIQUE_DATABASE | 389 | const,const | 1 | 100.00 | NULL |
| 1 | SIMPLE | TBLS | NULL | const | PRIMARY,UNIQUETABLE,TBLS_N49 | UNIQUETABLE | 268 | const,const | 1 | 100.00 | NULL |
| 1 | SIMPLE | PARTITIONS | NULL | const | PRIMARY,UNIQUEPARTITION | UNIQUEPARTITION | 779 | const,const | 1 | 100.00 | NULL |
| 1 | SIMPLE | PART_COL_STATS | NULL | ref | PCS_STATS_IDX | PCS_STATS_IDX | 777 | const,const | 1 | 10.00 | Using where |
+----+-------------+----------------+------------+-------+----------------------------------+-----------------+---------+-------------+------+----------+-------------+
4 rows in set, 1 warning (0.00 sec)
From what I see in benchmarkGetPartitionsStat, looks there is only one table, thousands of partitions and col stats, am I missing something?
2 millions partition column stats are added in the mysql before benchmark test. benchmarkGetPartitionsStat will still only query the count as input by -N options.
I guess the performance regression is caused by multiple join after removing the columns, how many databases, tables, partitions in the bench test?
The 2 millions partition column stats are: 10 dbs * 10 tbls per db * 200 partitions per tbl * 100 column stats per partition
BTW, all joins across DBS, TBLS, and PARTITIONS tables would hit indexes, I don't think such join would cause performance regression:
mysql> explain select * from PART_COL_STATS join PARTITIONS on PARTITIONS.PART_ID=PART_COL_STATS.PART_ID join TBLS on TBLS.TBL_ID=PARTITIONS.TBL_ID join DBS on DBS.DB_ID=TBLS.DB_ID where DBS.NAME='db0' and DBS.CTLG_NAME='hive' and TBLS.TBL_NAME='tbl0' and PARTITIONS.PART_NAME='p=part_val0' and PART_COL_STATS.engine='hive' and PART_COL_STATS.COLUMN_NAME='id0'; +----+-------------+----------------+------------+-------+----------------------------------+-----------------+---------+-------------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------------+------------+-------+----------------------------------+-----------------+---------+-------------+------+----------+-------------+ | 1 | SIMPLE | DBS | NULL | const | PRIMARY,UNIQUE_DATABASE,CTLG_FK1 | UNIQUE_DATABASE | 389 | const,const | 1 | 100.00 | NULL | | 1 | SIMPLE | TBLS | NULL | const | PRIMARY,UNIQUETABLE,TBLS_N49 | UNIQUETABLE | 268 | const,const | 1 | 100.00 | NULL | | 1 | SIMPLE | PARTITIONS | NULL | const | PRIMARY,UNIQUEPARTITION | UNIQUEPARTITION | 779 | const,const | 1 | 100.00 | NULL | | 1 | SIMPLE | PART_COL_STATS | NULL | ref | PCS_STATS_IDX | PCS_STATS_IDX | 777 | const,const | 1 | 10.00 | Using where | +----+-------------+----------------+------------+-------+----------------------------------+-----------------+---------+-------------+------+----------+-------------+ 4 rows in set, 1 warning (0.00 sec)
in production, I've seen an average 100 tbl x 1000 part x 100 cols
2 consecutive builds fail with the same thing: TestMiniLlapLocalCliDriver.testCliDriver[partition_explain_ddl] @wecharyu, should we regenerate the q file and check the diff?
@deniskuzZ partition_explain_ddl.q
qtest seems flaky. I could run pass this test by using CI docker image but not sure why it failed in the CI. On the other hand, this test failed on my local machine with different jdk(1.8.0_392-b08).
I could find the difference in local is that it generate different bitVectors in temp col stat file, but not sure why. It seems not related to this patch, because get or update part_col_stats in HMS only take effect after the temp stat file generated. Is it resonable to disable this test for now?
@deniskuzZ
partition_explain_ddl.q
qtest seems flaky. I could run pass this test by using CI docker image but not sure why it failed in the CI. On the other hand, this test failed on my local machine with different jdk(1.8.0_392-b08).I could find the difference in local is that it generate different bitVectors in temp col stat file, but not sure why. It seems not related to this patch, because get or update part_col_stats in HMS only take effect after the temp stat file generated. Is it resonable to disable this test for now?
let's see results of flaky check first: http://ci.hive.apache.org/job/hive-flaky-check/807/
@deniskuzZ
partition_explain_ddl.q
qtest seems flaky. I could run pass this test by using CI docker image but not sure why it failed in the CI. On the other hand, this test failed on my local machine with different jdk(1.8.0_392-b08). I could find the difference in local is that it generate different bitVectors in temp col stat file, but not sure why. It seems not related to this patch, because get or update part_col_stats in HMS only take effect after the temp stat file generated. Is it resonable to disable this test for now?let's see results of flaky check first: http://ci.hive.apache.org/job/hive-flaky-check/807/
@wecharyu, flaky run reported green, so looks like this PR introduced problem
@wecharyu, flaky run reported green, so looks like this PR introduced problem
@deniskuzZ I could not reproduce this test failure in docker :( Here is a test result:
[INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ hive-it-qfile ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hive-it-qfile ---
[INFO]
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.144 s - in org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 51.450 s
[INFO] Finished at: 2024-02-02T17:50:19Z
[INFO] ------------------------------------------------------------------------
jenkins@ip-10-169-5-104:~/hive/itests/qtest$ sudo mvn test -Dtest=org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver -Dqfile=partition_explain_ddl.q
Could you also help run this test in your local and docker on both this patch and master to see if there was any failures?
started flaky check on your branch: http://ci.hive.apache.org/job/hive-flaky-check/808/
It seems no failure... http://ci.hive.apache.org/job/hive-flaky-check/808/testReport/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/
It seems no failure... http://ci.hive.apache.org/job/hive-flaky-check/808/testReport/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/
Yeah, then probably some other test affects it
some API change
HMSBenchmarks.java:450: error: no suitable constructor found for PartitionsStatsRequest(String,String,List<String>,List<String>,String)
restarted, some unrelated tests failed sonar. reported 20 new issues, are those new or just code movement?
No new code, the sonar reported issues were pre-existing.