HIVE-28523: Improve performance when drop partition or drop table
What changes were proposed in this pull request?
Reduce getDnsPath times to improve performance https://issues.apache.org/jira/browse/HIVE-28523
Why are the changes needed?
When the number of partitions is large,serial traversing and obtaining wh.getDnsPath will take a huge amount of time, and performance optimization can be done for this place For example, there are 200,000 partitions, each getDnsPath takes 10 milliseconds, and the serial fetch time can reach 30 minutes
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
By ut
Quality Gate passed
Issues
11 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
When is the tableDnsPath empty ? And why can this Improve performance?
Could you please provide more detaill about this optimization?
For example, if you guess wh.getDnsPath will take much time when dropping partition, you can do a jstack on HMS java pid, and you should see the wh.getDnsPath thread is in a WAITING or a BLOCK state for a long time.
So ,could you do a jstack on HMS to help we know more about this? Thanks.
Could you please provide more detaill about this optimization? For example, if you guess
wh.getDnsPathwill take much time when dropping partition, you can do ajstackon HMS java pid, and you should see thewh.getDnsPaththread is in aWAITINGor aBLOCKstate for a long time.So ,could you do a jstack on HMS to help we know more about this? Thanks.
It is executed in serial, which should not show that the state is waiting or block. I have a time-consuming printing here, which can show that there is a performance problem in this piece. I think that's a good indication of the problem
When is the tableDnsPath empty ? And why can this Improve performance?
When table type is view the tableDnsPath will be null, By reduce times to get filesytem client.
@Leonidas963 I checked https://issues.apache.org/jira/browse/HIVE-24838 which wanted to reduce the object stores fs call. I think it maybe can also be used to hdfs filesystem.
HIVE-24838 was merged to Hive 4. Can you try to test HIVE-24838 in your hadoop cluster? and set the metastore property as following: hive.blobstore.supported.schemes=hdfs,s3,s3a,s3n
BTW, i will check your fix later.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.