presto
presto copied to clipboard
Add asynchronous split generation in Hudi connector
Change logs
This PR introduces the asynchronous split generation in Hudi connector to speed up the query execution and reduce overall query finishing time. The asynchronous split generation is inspired by the background split loader in Hive connector, which asynchronously generates the splits into a queue in the background, while the query execution can proceed to process and read the splits as the split generation is still ongoing, so that the query execution is not completely blocked on the split generation for all partitions.
Two new configuration properties are added:
hudi.max-outstanding-splits(hudi.max_outstanding_splitsas session property): maximum outstanding splits in a batch enqueued for processinghudi.split-generator-parallelism(hudi.split_generator_parallelismas session property): number of threads to generate splits from partitions
Test plan
The changes are benchmarked through TPC-DS queries using Hudi connector, with and without metadata table enabled. The Presto cluster consists of 1 coordinator and 10 workers in r5.4xlarge type. This PR significantly improves the query latency of selective representative TPC-DS 1TB queries by 2.8x (64% latency reduction) and 4.3x (77% latency reduction), for no metadata table and using metadata table, respectively. With the asynchronous split generation in Hudi connector, the performance of a query in Hudi connector is similar to or better than that in Hive connector.
Release notes
== RELEASE NOTES ==
General Changes
* Introduces the asynchronous split generation in Hudi connector to speed up the query execution and reduce overall query finishing time.
Two new configuration properties are added. `hudi.max-outstanding-splits` (`hudi.max_outstanding_splits` as session property) controls the maximum outstanding splits in a batch enqueued for processing. `hudi.split-generator-parallelism` (`hudi.split_generator_parallelism` as session property) controls the number of threads to generate splits from partitions.
@codope @7c00 I addressed the review comments. PTAL.
@codope Looks like a few tests outside Hudi are flaky. Retrying them.
@7c00 Can you please take another pass? All comments are addressed.
Hi @7c00 - we are kind of blocked here for few weeks. If you don't mind, can I take this over, review from the top and land?
Still have not approved the review. Just reset the previous review.
LGTM overall. but left a few follow-ups. IMO having a parallelism of 8 to access partition info from metastore may be a tad aggressive? Thoughts?
Actually, this was tuned for tpcds read benchmark. I would bring down to 4 which should suffice for most use cases and then I will followup with a docs PR to update the configs.