spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-49827][SQL] Fetching all partitions from hive metastore in batches

Open Madhukar525722 opened this issue 1 year ago • 2 comments

What changes were proposed in this pull request?

When there is any predicate missing in getPartitionsbyFilter and it tries to fetch all the partitions, the request is broken into smaller chunks as:

  1. Retrieve the names of all partitions using getPartitionNames
  2. Divide the partition names list into smaller batches.
  3. Fetch the partitions using their names with function getPartitionsByNames.
  4. If the fetching fails, it reduces the batch size by 2 and looks for lesser number of partitions till the maxRetries hit

Why are the changes needed?

The change is to address the issue of heavy load on HMS, when there are huge number of partitions(~600,000), the metadata size exceeds the 2Gb limit on the thrift server buffer size. Hence we get socket time out and HMS crashes with OOM as well. Tried to replicate same behaviour as HIVE-27505

Does this PR introduce any user-facing change?

Yes To enable batching they should be using parameters as: spark.sql.hive.metastore.batchSize = 1000 , by default it is disabled spark.sql.metastore.partition.batch.retry.count = 3

How was this patch tested?

Tested in local environment with following performance With batch size = 1 24/09/28 18:11:21 INFO Shim_v2_3: Fetching all partitions completed in 717 ms

With batch size = -1 24/09/28 18:14:16 INFO Shim_v2_3: Fetching all partitions completed in 51 ms.

With batch size = 10 24/09/28 18:16:20 INFO Shim_v2_3: Fetching all partitions completed in 115 ms.

Was this patch authored or co-authored using generative AI tooling?

No

Madhukar525722 avatar Oct 03 '24 11:10 Madhukar525722

+CC @shardulm94

mridulm avatar Oct 05 '24 16:10 mridulm

Gentle reminder @mridulm @pan3793 @HyukjinKwon @shardulm94

Madhukar525722 avatar Oct 15 '24 11:10 Madhukar525722

Gentle ping @mridulm @pan3793 @HyukjinKwon @shardulm94 . Please review the change

Madhukar525722 avatar Nov 01 '24 18:11 Madhukar525722

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Feb 10 '25 00:02 github-actions[bot]