hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28523: Improve performance when drop partition or drop table

Open carl239 opened this issue 1 year ago • 6 comments

What changes were proposed in this pull request?

Reduce getDnsPath times to improve performance https://issues.apache.org/jira/browse/HIVE-28523

Why are the changes needed?

When the number of partitions is large,serial traversing and obtaining wh.getDnsPath will take a huge amount of time, and performance optimization can be done for this place For example, there are 200,000 partitions, each getDnsPath takes 10 milliseconds, and the serial fetch time can reach 30 minutes

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

By ut

carl239 avatar Sep 13 '24 08:09 carl239

When is the tableDnsPath empty ? And why can this Improve performance?

huiboliu2020 avatar Sep 14 '24 02:09 huiboliu2020

Could you please provide more detaill about this optimization? For example, if you guess wh.getDnsPath will take much time when dropping partition, you can do a jstack on HMS java pid, and you should see the wh.getDnsPath thread is in a WAITING or a BLOCK state for a long time.

So ,could you do a jstack on HMS to help we know more about this? Thanks.

zhangbutao avatar Sep 14 '24 14:09 zhangbutao

Could you please provide more detaill about this optimization? For example, if you guess wh.getDnsPath will take much time when dropping partition, you can do a jstack on HMS java pid, and you should see the wh.getDnsPath thread is in a WAITING or a BLOCK state for a long time.

So ,could you do a jstack on HMS to help we know more about this? Thanks.

It is executed in serial, which should not show that the state is waiting or block. I have a time-consuming printing here, which can show that there is a performance problem in this piece. I think that's a good indication of the problem

get_dns_path_cost

carl239 avatar Sep 14 '24 16:09 carl239

When is the tableDnsPath empty ? And why can this Improve performance?

When table type is view the tableDnsPath will be null, By reduce times to get filesytem client.

carl239 avatar Sep 14 '24 16:09 carl239

@Leonidas963 I checked https://issues.apache.org/jira/browse/HIVE-24838 which wanted to reduce the object stores fs call. I think it maybe can also be used to hdfs filesystem. HIVE-24838 was merged to Hive 4. Can you try to test HIVE-24838 in your hadoop cluster? and set the metastore property as following: hive.blobstore.supported.schemes=hdfs,s3,s3a,s3n

BTW, i will check your fix later.

zhangbutao avatar Oct 12 '24 06:10 zhangbutao

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Dec 12 '24 00:12 github-actions[bot]