amoro icon indicating copy to clipboard operation
amoro copied to clipboard

[ARORO-3289] Avoid calling getMixedTablePartitionSpecById in the scan loop

Open 7hong opened this issue 1 year ago • 1 comments

Why are the changes needed?

Close #3289 .

How was this patch tested?

In my environment, there is a table with many partitions waiting to be merged. Each plan is very time-consuming. After optimization, the time-consuming is significantly reduced

image

7hong avatar Oct 22 '24 08:10 7hong

@zhoujinsong @majin1102 Do you have time to review it? Thanks

7hong avatar Oct 24 '24 02:10 7hong

image

Will the old code trigger a reload of TableMetadata?

It seems that if the reloading of TableMetadata is not triggered, the performance of the new code is the same as that of the old code.

Iceberg TableMetadata also returns Specs Map objects from memory

image image

baiyangtx avatar Nov 06 '24 02:11 baiyangtx

@baiyangtx Yes, the old code will refresh the TableMetadata. Especially when calling the getUGI method in an environment with Kerberos enabled, it will enter a synchronous blocking state.

7hong avatar Nov 07 '24 06:11 7hong

@czy006 Flame graph before optimization: image

Flame graph after optimization:

image

7hong avatar Dec 02 '24 11:12 7hong