feat: Add CLL to OpenLineage in BigQueryInsertJobOperator
BigQueryInsertJobOperator already support OpenLineage for QUERY type jobs, but it lacks Column Level Lineage (CLL).
This PR introduces CLL (Column-Level Lineage) to this operator based on SQL parsing, which can be useful in straightforward scenarios. However, since SQL parsing alone might not always provide all the details (e.g. in SQL query we can reference table only by table name, or dataset.table without the project_id), checks have been implemented to ensure accurate lineage. As a result CLL may not be included when there is uncertainty about its correctness.
There is another change not related to CLL: right now output table is duplicated into input tables. We are creating a list of input tables based on referencedTables property provided by Google and as it turns out, this also includes the destination table. So f.e. this query:
INSERT INTO a.b.c VALUES (1, "a", 23)
would return a.b.c as input table and output table.
This PR fixes it by removing output table from input tables. I am not sure if it's a correct approach as sometimes users may write a query that performs a process that moves data from one table to the same table but i think this is rare and also this kind of lineage information (from A to A) does not provide much value. Please let me know if you think I'm wrong.
I also refactored the mixin a bit to make it clearer and prepare for adding support for job types other than QUERY. I also change the class name - in the beginning it's supposed to be a general mixin, but BigQueryInsertJobOperator is so complex that this mixin will only be used with that class.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.