gravitino icon indicating copy to clipboard operation
gravitino copied to clipboard

[Bug report] list-table api is very slow when table quantity is very large

Open mygrsun opened this issue 1 year ago • 9 comments

Version

main branch

Describe what's wrong

Through my test,I found that list-table will takes 300s when a schema has 5000 tables . I analysis the code and add some logs ,then found is the reason for calling the getTableObjectsByName interface. listtable use the getTableObjectsByName .this metatore interface is very slow. image

Error message and/or stacktrace

I add some logs at 3 positions. image

the result is:

image

How to reproduce

add 5000 tables to one schema

Additional context

No response

mygrsun avatar Jul 05 '24 09:07 mygrsun

I found that using this getTableObjectsByName mainly to filter out inner and outer tables, as well as to filter out iceberg tables. If I don't filter the inner and outer surfaces. What is the impact here? What additional types of tables will be return?

mygrsun avatar Jul 05 '24 09:07 mygrsun

Can you provide the direct query time for HMS without Gravitino?

we have tested id。 when I excute "show tables" in hive beeline and sprark .it is very fast. I gusess hiveserver2 don't use this getTableObjectsByName interface .because 'show tables' just return table names.

mygrsun avatar Jul 05 '24 09:07 mygrsun

time1 and time2 do not seem to appear in the picture?

mchades avatar Jul 05 '24 09:07 mchades

time1 and time2 do not seem to appear in the picture?

sorry, i will send you a new one

mygrsun avatar Jul 05 '24 09:07 mygrsun

time1 and time2 do not seem to appear in the picture?

image

mygrsun avatar Jul 05 '24 09:07 mygrsun

image

I have tryed the listTableNamesByFilter inteface to filter iceberg table。It is a feasible approach. but I did not pay attention to filter the manager and external table,I dont know the point of filtering manager and external table.

so, please check this way. if it is acceptable ,I can submit a pr.

mygrsun avatar Jul 08 '24 02:07 mygrsun

image I have tryed the listTableNamesByFilter inteface to filter iceberg table。It is a feasible approach. but I did not pay attention to filter the manager and external table,I dont know the point of filtering manager and external table.

so, please check this way. if it is acceptable ,I can submit a pr.

Great! I think we can work on this way. WDYT? @jerryshao @FANNG1

mchades avatar Jul 08 '24 03:07 mchades

Great! I think we can work on this way. WDYT? @jerryshao @FANNG1

I think it's ok, because this method seems extensible and not only works for filter Iceberg tables.

FANNG1 avatar Jul 08 '24 07:07 FANNG1

Hi @mygrsun , is there any progress? Can I assign this issue to you?

mchades avatar Jul 15 '24 03:07 mchades