amoro icon indicating copy to clipboard operation
amoro copied to clipboard

[AMORO-2599] ams terminal support unified catalog

Open Aireed opened this issue 1 year ago • 3 comments

Why are the changes needed?

Close #2599.

Brief change log

  • fix:spark get ClassNotFoundException when select from table under unified Catalog
  • add some logical to create unified catalog when catalog supports multi table formats
  • expose ArcticCatalog of MixedCatalog for we need to get the HMSClientPool .
  • add some method same with InternalCatalog to ExternalCatalog ,like loadTableMetadata ,createTable for it support mixed-hive format.
  • Replace the InternalCatalog class with the ServerCatalog class to be compatible with the ExternalCatalog in DefaultTableService class
  • The ExternalCatalog class should be aware of the address for AMS in order to support the mixed-hive table format.

How was this patch tested?

  • [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • [ ] Add screenshots for manual tests if appropriate

  • [x] Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

Aireed avatar Mar 05 '24 09:03 Aireed

Codecov Report

Attention: Patch coverage is 10.86957% with 41 lines in your changes are missing coverage. Please review.

Project coverage is 34.71%. Comparing base (82670b6) to head (e2b24fd).

Files Patch % Lines
...c/server/dashboard/controller/TableController.java 0.00% 21 Missing :warning:
...tease/arctic/server/terminal/SparkContextUtil.java 0.00% 10 Missing :warning:
...etease/arctic/server/terminal/TerminalManager.java 0.00% 9 Missing :warning:
.../netease/arctic/server/catalog/CatalogBuilder.java 83.33% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2600      +/-   ##
============================================
- Coverage     34.72%   34.71%   -0.02%     
  Complexity     4520     4520              
============================================
  Files           608      608              
  Lines         50959    50980      +21     
  Branches       6683     6686       +3     
============================================
- Hits          17698    17697       -1     
- Misses        31808    31829      +21     
- Partials       1453     1454       +1     
Flag Coverage Δ
core 33.06% <10.86%> (-0.02%) :arrow_down:
trino 50.89% <ø> (-0.05%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 05 '24 09:03 codecov[bot]

The current code looks a bit strange in order to be compatible with mixed-hive logic.

I think there are some basic principles that should be recognized.

  1. The difference between InternalCatalog and ExternalCatalog is whether or not the metadata information of the table is stored by AMS. For HMS, it obviously belongs to ExternalCatalog.

  2. MixedHive is a special optimization of the Hive table, so it must belong to the Hive Catalog.

From the above, it can be concluded that the meta information of the MixedHive table should not be managed by AMS, and its implementation should be the same as the current MixedIceberg table under ExternalCatalog, and the mixed-format information should be stored in HMS.

But obviously, due to the historical legacy of MixedHive, operations such as LoadTable/CreateTable are still done by AMS.

In order to achieve the final UnifiedCatalog, and to be compatible with the historical MixedHive logic, I suggest splitting the following PRs.

  1. Using the UnifiedCatalog Connector in Terminal, what this PR is doing. In this PR, only the part of the Terminal about Spark Connector initialization will be modified.

  2. Making multiple Table Formats options are available on the Catalog registration page. In this PR, only the front-end selection logic was modified, changing the single-select to multi-select. Validation is performed on the backend according to the limitations of the current code

2.1 Rule1: In the Hive Catalog, if MixedHive is selected, you can only select one format. 2.2 Rule2: In the InternalCatalog, you can only select on format.

  1. Modify the implementation logic of MixedHive so that the Create/Load logic does not go through AMS, and tries to load historical tables according to historical rules

  2. Remove the rule in 2.1 && 2.2

After completing the above 4 PRs, AMS will fully support multiple formats in one Catalog and can use them in Terminal

baiyangtx avatar Mar 06 '24 09:03 baiyangtx

The current code looks a bit strange in order to be compatible with mixed-hive logic.

I think there are some basic principles that should be recognized.

  1. The difference between InternalCatalog and ExternalCatalog is whether or not the metadata information of the table is stored by AMS. For HMS, it obviously belongs to ExternalCatalog.
  2. MixedHive is a special optimization of the Hive table, so it must belong to the Hive Catalog.

From the above, it can be concluded that the meta information of the MixedHive table should not be managed by AMS, and its implementation should be the same as the current MixedIceberg table under ExternalCatalog, and the mixed-format information should be stored in HMS.

But obviously, due to the historical legacy of MixedHive, operations such as LoadTable/CreateTable are still done by AMS.

In order to achieve the final UnifiedCatalog, and to be compatible with the historical MixedHive logic, I suggest splitting the following PRs.

  1. Using the UnifiedCatalog Connector in Terminal, what this PR is doing. In this PR, only the part of the Terminal about Spark Connector initialization will be modified.
  2. Making multiple Table Formats options are available on the Catalog registration page. In this PR, only the front-end selection logic was modified, changing the single-select to multi-select. Validation is performed on the backend according to the limitations of the current code

2.1 Rule1: In the Hive Catalog, if MixedHive is selected, you can only select one format. 2.2 Rule2: In the InternalCatalog, you can only select on format.

  1. Modify the implementation logic of MixedHive so that the Create/Load logic does not go through AMS, and tries to load historical tables according to historical rules
  2. Remove the rule in 2.1 && 2.2

After completing the above 4 PRs, AMS will fully support multiple formats in one Catalog and can use them in Terminal

as discussed , This PR we are only focusing on terminal support for unified Catalog. I will raise two more PRs separately, one for frontend interface modification, and one for unified catalog compatibility with mixed-hive. When adding a new catalog, support for selecting multiple formats.

#2630 #2629

Aireed avatar Mar 12 '24 14:03 Aireed

@baiyangtx tks for your detailedly review。 i have fixed them , PTAL again

Aireed avatar Mar 24 '24 11:03 Aireed