hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-4503] support for parsing identifier with catalog

Open YannByron opened this issue 3 years ago • 4 comments

What is the purpose of the pull request

This PR is going to support the identifier with a catalog.database.table format.

For Spark3 that support catalog, we can not just transform UnsolvedRelation to TableIdentifier directly. Because in the cases that need to cooperate with the tables from the other catalog, it will block and throw an exception like https://github.com/apache/hudi/issues/6223#issuecomment-1198952333.

Brief change log

Apply spark.sessionState.analyzer.CatalogAndIdentifier to parse identifier whatever it has catalog or not.

Verify this pull request

TestSpark3Catalog UT

Committer checklist

  • [ ] Has a corresponding JIRA in PR title & commit

  • [ ] Commit message is descriptive of the change

  • [ ] CI is green

  • [ ] Necessary doc changes done or have another open PR

  • [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

YannByron avatar Aug 01 '22 08:08 YannByron

@alexeykudinkin please help to review this again.

YannByron avatar Aug 08 '22 13:08 YannByron

@YannByron can you please also update the description to make sure it has relevant info?

alexeykudinkin avatar Aug 09 '22 20:08 alexeykudinkin

CI report:

  • 066e20303c737b6c0b441c5a92cb406ca45386ba Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Aug 10 '22 05:08 hudi-bot

@YannByron @xushiyan I also started experimenting myself on this issue to see if my hunch is right that we can avoid pulling in more resolution logic from Spark into Hudi in: https://github.com/apache/hudi/pull/6361/files

Let's hold on merging this PR until we confirm whether we'd be able to avoid pulling in resolution logic and make things simpler in the end.

alexeykudinkin avatar Aug 11 '22 18:08 alexeykudinkin