OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

[SqlGlot] Improve Lineage Parsing

Open harshach opened this issue 1 month ago • 9 comments

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • [ ] Bug fix
  • [ ] Improvement
  • [ ] New feature
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Documentation

Checklist:

  • [x] I have read the CONTRIBUTING document.
  • [ ] My PR title is Fixes <issue-number>: <short explanation>
  • [ ] I have commented on my code, particularly in hard-to-understand areas.
  • [ ] For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • New SQLGlot parser:
    • SQLGlotLineageRunner in ingestion/lineage/sqlglot_parser.py provides 100% parse success (vs 42% with SQLFluff) and 27x faster performance
    • Supports column-level lineage extraction with schema awareness and query literal masking
  • Parser selection factory:
    • create_lineage_parser in parser_selection.py supports 4 modes: sqlglot (default), sqlfluff (deprecated), sqlparse (fallback), auto
    • Integrated across 15+ dashboard sources (Grafana, Looker, Superset, Tableau, etc.) and query parser processor
  • Configuration schema:
    • Added lineageParserType field to 24 database connection schemas with default value sqlglot
    • Common schema in database/common/lineageParserConfig.json referenced via $ref
  • Comprehensive testing:
    • 4 new test suites with 1500+ lines validating complex SQL patterns (CTEs, window functions, dbt models, dialect-specific syntax)
    • Tests cover GitHub issue queries, production queries, and parser selection logic

harshach avatar Nov 12 '25 01:11 harshach

TypeScript types have been updated based on the JSON schema changes in the PR

github-actions[bot] avatar Nov 12 '25 01:11 github-actions[bot]

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.12)

Vulnerabilities (14)

Package Vulnerability ID Severity Installed Version Fixed Version
linux-libc-dev CVE-2025-39931 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39949 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39955 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39968 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39970 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39971 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39973 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39980 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39982 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39993 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39994 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-39998 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-40040 🚨 HIGH 6.1.153-1 6.1.158-1
linux-libc-dev CVE-2025-40096 🚨 HIGH 6.1.153-1 6.1.158-1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (31)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (1)

Package Vulnerability ID Severity Installed Version Fixed Version
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

github-actions[bot] avatar Nov 12 '25 01:11 github-actions[bot]

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.9)

Vulnerabilities (19)

Package Vulnerability ID Severity Installed Version Fixed Version
libexpat1 CVE-2023-52425 🚨 HIGH 2.5.0-1+deb12u1 2.5.0-1+deb12u2
libexpat1 CVE-2024-8176 🚨 HIGH 2.5.0-1+deb12u1 2.5.0-1+deb12u2
libgnutls30 CVE-2025-32988 🚨 HIGH 3.7.9-2+deb12u3 3.7.9-2+deb12u5
libgnutls30 CVE-2025-32990 🚨 HIGH 3.7.9-2+deb12u3 3.7.9-2+deb12u5
libicu72 CVE-2025-5222 🚨 HIGH 72.1-3 72.1-3+deb12u1
libperl5.36 CVE-2023-31484 🚨 HIGH 5.36.0-7+deb12u1 5.36.0-7+deb12u3
libperl5.36 CVE-2024-56406 🚨 HIGH 5.36.0-7+deb12u1 5.36.0-7+deb12u2
libsqlite3-0 CVE-2025-6965 🔥 CRITICAL 3.40.1-2+deb12u1 3.40.1-2+deb12u2
libxslt1.1 CVE-2024-55549 🚨 HIGH 1.1.35-1 1.1.35-1+deb12u1
libxslt1.1 CVE-2025-24855 🚨 HIGH 1.1.35-1 1.1.35-1+deb12u1
libxslt1.1 CVE-2025-7424 🚨 HIGH 1.1.35-1 1.1.35-1+deb12u2
perl CVE-2023-31484 🚨 HIGH 5.36.0-7+deb12u1 5.36.0-7+deb12u3
perl CVE-2024-56406 🚨 HIGH 5.36.0-7+deb12u1 5.36.0-7+deb12u2
perl-base CVE-2023-31484 🚨 HIGH 5.36.0-7+deb12u1 5.36.0-7+deb12u3
perl-base CVE-2024-56406 🚨 HIGH 5.36.0-7+deb12u1 5.36.0-7+deb12u2
perl-modules-5.36 CVE-2023-31484 🚨 HIGH 5.36.0-7+deb12u1 5.36.0-7+deb12u3
perl-modules-5.36 CVE-2024-56406 🚨 HIGH 5.36.0-7+deb12u1 5.36.0-7+deb12u2
sqlite3 CVE-2025-6965 🔥 CRITICAL 3.40.1-2+deb12u1 3.40.1-2+deb12u2
sudo CVE-2025-32462 🚨 HIGH 1.9.13p3-1+deb12u1 1.9.13p3-1+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (31)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (10)

Package Vulnerability ID Severity Installed Version Fixed Version
Authlib CVE-2025-59420 🚨 HIGH 1.3.1 1.6.4
Authlib CVE-2025-61920 🚨 HIGH 1.3.1 1.6.5
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiomysql CVE-2025-62611 🚨 HIGH 0.2.0 0.3.0
apache-airflow-providers-common-sql CVE-2025-30473 🚨 HIGH 1.21.0 1.24.1
deepdiff CVE-2025-58367 🔥 CRITICAL 7.0.1 8.6.1
redshift-connector CVE-2025-5279 🚨 HIGH 2.1.5 2.1.7
setuptools CVE-2024-6345 🚨 HIGH 65.5.1 70.0.0
setuptools CVE-2025-47273 🚨 HIGH 65.5.1 78.1.1
tornado CVE-2025-47287 🚨 HIGH 6.4.2 6.5

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

github-actions[bot] avatar Nov 12 '25 01:11 github-actions[bot]

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 63%
63.95% (50435/78861) 41.35% (24375/58943) 44.92% (7716/17179)

github-actions[bot] avatar Nov 12 '25 03:11 github-actions[bot]

TypeScript types have been updated based on the JSON schema changes in the PR

github-actions[bot] avatar Nov 12 '25 06:11 github-actions[bot]

TypeScript types have been updated based on the JSON schema changes in the PR

github-actions[bot] avatar Nov 12 '25 06:11 github-actions[bot]

TypeScript types have been updated based on the JSON schema changes in the PR

github-actions[bot] avatar Dec 08 '25 02:12 github-actions[bot]

Closing this as we are handling it differently now - https://github.com/open-metadata/OpenMetadata/pull/24729

mohittilala avatar Dec 14 '25 13:12 mohittilala

🎸 Gitar is monitoring - nothing to report at this time

Gitar will analyze CI failures, apply rules, and respond to comments when relevant.

Options Auto-apply: off

Auto-apply is off Gitar will not commit updates to this branch.
Display: compact Hiding non-applicable rules.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

This comment will update automatically (Docs)

gitar-bot[bot] avatar Dec 14 '25 13:12 gitar-bot[bot]