gravitino icon indicating copy to clipboard operation
gravitino copied to clipboard

[#4089] fix(hive catalog): the problem of slow acquisition of hive table list

Open mygrsun opened this issue 1 year ago • 5 comments

What changes were proposed in this pull request?

the problem of slow acquisition of hive table list. Using listTableNamesByFilter replace the getTableObjectsByName method.

Why are the changes needed?

I found that list-table will takes 300s when a schema has 5000 tables .

Fix: #4089

Does this PR introduce any user-facing change?

no

How was this patch tested?

Manual testing

mygrsun avatar Aug 09 '24 12:08 mygrsun

Hi, @mygrsun could you plz resolve the comment and CI issue? Thanks

mchades avatar Aug 12 '24 07:08 mchades

@FANNG1 @yuqi1129 Can you help review it?

mchades avatar Aug 19 '24 03:08 mchades

Now,the ci issue is a troublesome problem. https://github.com/apache/gravitino/actions/runs/10452698362/job/28942732918?pr=4469 This ci issue is due to a bug in hive, and the iceberg project has the same problem. https://github.com/apache/iceberg/pull/2722#issuecomment-867363019 image

This hive bug is only happen in Derby. In our environment , the metastore storage is mysql,so we don't encounter this problem.

mygrsun avatar Aug 21 '24 05:08 mygrsun

@mygrsun Is there any progress? May I take on this?

mchades avatar Sep 26 '24 08:09 mchades

@mygrsun Is there any progress? May I take on this?

thanks,you can do it

mygrsun avatar Sep 26 '24 08:09 mygrsun

I have verified locally that through this PR, the time consumption of listing 1000 tables can be reduced from 2043ms to 14ms.

It's ready for review now. @jerryshao

mchades avatar Nov 01 '24 11:11 mchades