datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Bigquery metadata igestion profiling bug

Open dragoscadore opened this issue 3 years ago • 2 comments

Describe the bug On biguqery ingestion with profiling: enabled: true datahub doesn't know how to query arrays. I got an error like this 'Cannot access field colors on a value with type ARRAY<STRUCT<colors BOOL>> at ' This is because datahub tries to do the query the column like this (where hide_product_relations is an array) hide_product_relations.colors but in bigquery, you need to do an "unnest" before querying an array.

To Reproduce Steps to reproduce the behavior:

  1. Get this recipe file source: type: bigquery config: project_id: <project_id> max_query_duration: 5 include_table_lineage: True profiling: enabled: true sink: type: "datahub-rest" config: server: "<IP>
  2. Run ingestion datahub ingest -c recipe.yaml

Note.* you need a table with arrays in your project

Expected behavior The profiling need to know how to do the "unnest"

dragoscadore avatar Feb 03 '22 07:02 dragoscadore

Hi folks,

Any update on this?

The ingestion job keeps failing when parsing nested columns in bigquery.

[2022-08-08 19:06:40,554] ERROR    {datahub.utilities.sql_lineage_parser_impl:95} - SQL lineage analyzer error 'An Identifier is expected, got Function[value: `UNNEST`(custom_attributes)] instead.' for query: 'SELECT *
  FROM (
        SELECT clmn0_,
               clmn1_,
               AVG(clmn3_) AS clmn100000_
          FROM (
                SELECT *
                  FROM (
                        SELECT t0.event_date AS clmn0_,
                               t0.event_name AS clmn1_,
                               t0.fail_rate AS clmn3_
                          FROM (
                                with `data_table` as (
                                        select *,
                                               __d_a_t_e(event_timestamp) AS event_date,
                                               (
                                                SELECT value
                                                  FROM `UNNEST`(custom_attributes)
                                                 WHERE key = 'success'
                                            ......
       .....
       )
 LIMIT 20000000
[2022-08-08 19:06:40,554] ERROR    {datahub.utilities.sql_lineage_parser_impl:100} - sql holder not present so cannot get tables
[2022-08-08 19:06:40,554] ERROR    {datahub.utilities.sql_lineage_parser_impl:121} - sql holder not present so cannot get columns
...
"2022-08-08 19:08:37.440574 [exec_id=b9fa58e7-aab2-449b-95a3-ea40494c415d] INFO: Failed to execute 'datahub ingest'"

Update: moving to bigquery-beta has solved the problem.

hieunt-itfoss avatar Aug 08 '22 12:08 hieunt-itfoss

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Sep 15 '22 02:09 github-actions[bot]

I got this bug on 0.8.45

ananbas avatar Oct 13 '22 15:10 ananbas

@ananbas please move to bigquery-beta and CLI: 0.8.45.2 (refer to this article https://datahubproject.io/docs/ui-ingestion#advanced-running-with-a-specific-cli-version)

hieunt-itfoss avatar Oct 14 '22 12:10 hieunt-itfoss

This was fixed by https://github.com/datahub-project/datahub/pull/6613, and should be available in acryl-datahub v0.9.3.2.

hsheth2 avatar Dec 06 '22 23:12 hsheth2