datahub
datahub copied to clipboard
feat(ingest/athena): Iceberg partition columns extraction
-
Change simple column names from v2 to v1
-
Athena's get metadata call doesn't return Iceberg partitions. This pr adds a more advanced table property and partitions extraction, which extracts it from the Table Create statement using SQL parsing. This is only used for partition extraction, but in the future, it can be used for other use-cases as well (like table ingestion from audit logs)
This feature is disabled by default and it is experimental.
Codecov Report
:x: Patch coverage is 78.94737% with 84 lines in your changes missing coverage. Please review.
:white_check_mark: All tests successful. No failed tests found.
:loudspeaker: Thoughts on this report? Let us know!
🔴 Meticulous spotted visual differences in 3 of 1270 screens tested: view and approve differences detected.
Meticulous evaluated ~10 hours of user flows against your PR.
Last updated for commit 72355f6. This comment will update as new commits are pushed.
It seems a fix and a feature are mixed in this PR. If that's the case and ideally, these should be separated PRs.
Specifically about this
Change simple column names from v2 to v1
What are we really fixing here? Can you add more details about how the issue was visible and how this is fixing the issue? Either here in the PR or in the code.
Also, is this downgrade on the field path version a breaking change? I mean, those fields paths (now v1) may be used in some schema field urns, right? If so, those urns would be orphan now.
sqlglot's issue with reading escaped quotes in Athena SHOW CREATE TABLE comments has been fixed in https://github.com/tobymao/sqlglot/pull/5233 , will probably make it into a release soon based on recent velocity.
It seems a fix and a feature are mixed in this PR. If that's the case and ideally, these should be separated PRs.
Specifically about this
Change simple column names from v2 to v1
What are we really fixing here? Can you add more details about how the issue was visible and how this is fixing the issue? Either here in the PR or in the code.
Also, is this downgrade on the field path version a breaking change? I mean, those fields paths (now v1) may be used in some schema field urns, right? If so, those urns would be orphan now.
We only convert simple fields and it is not a breaking change.