datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat(ingest/athena): Iceberg partition columns extraction

Open treff7es opened this issue 6 months ago • 3 comments

  • Change simple column names from v2 to v1

  • Athena's get metadata call doesn't return Iceberg partitions. This pr adds a more advanced table property and partitions extraction, which extracts it from the Table Create statement using SQL parsing. This is only used for partition extraction, but in the future, it can be used for other use-cases as well (like table ingestion from audit logs)

    This feature is disabled by default and it is experimental.

treff7es avatar May 23 '25 08:05 treff7es

Codecov Report

:x: Patch coverage is 78.94737% with 84 lines in your changes missing coverage. Please review. :white_check_mark: All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ngestion/source/sql/athena_properties_extractor.py 80.05% 71 Missing :warning:
...gestion/src/datahub/ingestion/source/sql/athena.py 69.76% 13 Missing :warning:

:loudspeaker: Thoughts on this report? Let us know!

codecov[bot] avatar May 23 '25 09:05 codecov[bot]

🔴 Meticulous spotted visual differences in 3 of 1270 screens tested: view and approve differences detected.

Meticulous evaluated ~10 hours of user flows against your PR.

Last updated for commit 72355f6. This comment will update as new commits are pushed.

alwaysmeticulous[bot] avatar May 23 '25 09:05 alwaysmeticulous[bot]

It seems a fix and a feature are mixed in this PR. If that's the case and ideally, these should be separated PRs.

Specifically about this

Change simple column names from v2 to v1

What are we really fixing here? Can you add more details about how the issue was visible and how this is fixing the issue? Either here in the PR or in the code.

Also, is this downgrade on the field path version a breaking change? I mean, those fields paths (now v1) may be used in some schema field urns, right? If so, those urns would be orphan now.

sgomezvillamor avatar May 26 '25 13:05 sgomezvillamor

sqlglot's issue with reading escaped quotes in Athena SHOW CREATE TABLE comments has been fixed in https://github.com/tobymao/sqlglot/pull/5233 , will probably make it into a release soon based on recent velocity.

ligfx avatar Jun 17 '25 13:06 ligfx

It seems a fix and a feature are mixed in this PR. If that's the case and ideally, these should be separated PRs.

Specifically about this

Change simple column names from v2 to v1

What are we really fixing here? Can you add more details about how the issue was visible and how this is fixing the issue? Either here in the PR or in the code.

Also, is this downgrade on the field path version a breaking change? I mean, those fields paths (now v1) may be used in some schema field urns, right? If so, those urns would be orphan now.

We only convert simple fields and it is not a breaking change.

treff7es avatar Jun 30 '25 06:06 treff7es