presto icon indicating copy to clipboard operation
presto copied to clipboard

Add asynchronous split generation in Hudi connector

Open yihua opened this issue 3 years ago • 2 comments

Change logs

This PR introduces the asynchronous split generation in Hudi connector to speed up the query execution and reduce overall query finishing time. The asynchronous split generation is inspired by the background split loader in Hive connector, which asynchronously generates the splits into a queue in the background, while the query execution can proceed to process and read the splits as the split generation is still ongoing, so that the query execution is not completely blocked on the split generation for all partitions.

Two new configuration properties are added:

  • hudi.max-outstanding-splits (hudi.max_outstanding_splits as session property): maximum outstanding splits in a batch enqueued for processing
  • hudi.split-generator-parallelism (hudi.split_generator_parallelism as session property): number of threads to generate splits from partitions

Test plan

The changes are benchmarked through TPC-DS queries using Hudi connector, with and without metadata table enabled. The Presto cluster consists of 1 coordinator and 10 workers in r5.4xlarge type. This PR significantly improves the query latency of selective representative TPC-DS 1TB queries by 2.8x (64% latency reduction) and 4.3x (77% latency reduction), for no metadata table and using metadata table, respectively. With the asynchronous split generation in Hudi connector, the performance of a query in Hudi connector is similar to or better than that in Hive connector.

Release notes

== RELEASE NOTES ==

General Changes
* Introduces the asynchronous split generation in Hudi connector to speed up the query execution and reduce overall query finishing time.
  Two new configuration properties are added.  `hudi.max-outstanding-splits` (`hudi.max_outstanding_splits` as session property) controls the maximum outstanding splits in a batch enqueued for processing.  `hudi.split-generator-parallelism` (`hudi.split_generator_parallelism` as session property) controls the number of threads to generate splits from partitions.

yihua avatar Aug 22 '22 08:08 yihua

@codope @7c00 I addressed the review comments. PTAL.

yihua avatar Sep 07 '22 04:09 yihua

@codope Looks like a few tests outside Hudi are flaky. Retrying them.

yihua avatar Sep 14 '22 18:09 yihua

@7c00 Can you please take another pass? All comments are addressed.

codope avatar Feb 07 '23 05:02 codope

Hi @7c00 - we are kind of blocked here for few weeks. If you don't mind, can I take this over, review from the top and land?

vinothchandar avatar Feb 14 '23 16:02 vinothchandar

Still have not approved the review. Just reset the previous review.

vinothchandar avatar Feb 22 '23 18:02 vinothchandar

LGTM overall. but left a few follow-ups. IMO having a parallelism of 8 to access partition info from metastore may be a tad aggressive? Thoughts?

Actually, this was tuned for tpcds read benchmark. I would bring down to 4 which should suffice for most use cases and then I will followup with a docs PR to update the configs.

codope avatar Feb 23 '23 19:02 codope