datahub
datahub copied to clipboard
Bigquery metadata igestion profiling bug
Describe the bug
On biguqery ingestion with profiling: enabled: true datahub doesn't know how to query arrays.
I got an error like this
'Cannot access field colors on a value with type ARRAY<STRUCT<colors BOOL>> at '
This is because datahub tries to do the query the column like this (where hide_product_relations is an array) hide_product_relations.colors but in bigquery, you need to do an "unnest" before querying an array.
To Reproduce Steps to reproduce the behavior:
- Get this recipe file source: type: bigquery config: project_id: <project_id> max_query_duration: 5 include_table_lineage: True profiling: enabled: true sink: type: "datahub-rest" config: server: "<IP>
- Run ingestion datahub ingest -c recipe.yaml
Note.* you need a table with arrays in your project
Expected behavior The profiling need to know how to do the "unnest"
Hi folks,
Any update on this?
The ingestion job keeps failing when parsing nested columns in bigquery.
[2022-08-08 19:06:40,554] ERROR {datahub.utilities.sql_lineage_parser_impl:95} - SQL lineage analyzer error 'An Identifier is expected, got Function[value: `UNNEST`(custom_attributes)] instead.' for query: 'SELECT *
FROM (
SELECT clmn0_,
clmn1_,
AVG(clmn3_) AS clmn100000_
FROM (
SELECT *
FROM (
SELECT t0.event_date AS clmn0_,
t0.event_name AS clmn1_,
t0.fail_rate AS clmn3_
FROM (
with `data_table` as (
select *,
__d_a_t_e(event_timestamp) AS event_date,
(
SELECT value
FROM `UNNEST`(custom_attributes)
WHERE key = 'success'
......
.....
)
LIMIT 20000000
[2022-08-08 19:06:40,554] ERROR {datahub.utilities.sql_lineage_parser_impl:100} - sql holder not present so cannot get tables
[2022-08-08 19:06:40,554] ERROR {datahub.utilities.sql_lineage_parser_impl:121} - sql holder not present so cannot get columns
...
"2022-08-08 19:08:37.440574 [exec_id=b9fa58e7-aab2-449b-95a3-ea40494c415d] INFO: Failed to execute 'datahub ingest'"
Update: moving to bigquery-beta
has solved the problem.
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
I got this bug on 0.8.45
@ananbas please move to bigquery-beta
and CLI: 0.8.45.2 (refer to this article https://datahubproject.io/docs/ui-ingestion#advanced-running-with-a-specific-cli-version)
This was fixed by https://github.com/datahub-project/datahub/pull/6613, and should be available in acryl-datahub v0.9.3.2.