Bug - OnSchemaChange doesnt work as expected in Bigquery Dataform
Steps Taken to Address the Issue: 1)Core Version Update: Upgraded the core version to 3.0.12 in workflow_settings.yaml.
2)Package Installation: Installed necessary dependencies.
3)Initial SQLX Configuration: Created a SQLX file with the following configuration:
config {
type: "incremental",
schema: "test",
onSchemaChange: "EXTEND",
}
select 1 as a, 2 as b, "asd" as c;
Column Addition: Introduced a new column d to the SQLX configuration:
config {
type: "incremental",
schema: "test",
onSchemaChange: "EXTEND",
}
select 1 as a, 2 as b, "asd" as c, "test" as d;
Expected vs. Actual Outcome:
-
Expected Behavior: The schema should recognize the newly added column d and automatically update the table structure.
-
Actual Behavior: the table schema was not updated accordingly.
-
Additional Issue: When adding uniqueKey, and dropped a column ,an error occurred
Query error: Name c not found inside S
Root Cause Analysis & Suggested Fixes:
- The current logic retrieves column names from INFORMATION_SCHEMA.COLUMNS, which does not immediately reflect newly added columns / removed columns.
This feature is not yet ready for use unfortunately. We'll do a release note when this feature is properly available.
@GJMcGowan I am using dataform 3.0.23 in bigquery and this issue still exists. Is there anything I need to do in this case to make it work?
I am using dataform 3.0.23 in bigquery and this issue still exists. Is there anything I need to do in this case to make it work?
This feature is still not released officially, so you can't use it yet.
Hello @kolina , thank you for your response.
For anyone looking in the future: the official BigQuery Dataform release is currently at version 3.0.0. You can find the release notes here: https://cloud.google.com/dataform/docs/release-notes
For anyone looking in the future: the official BigQuery Dataform release is currently at version 3.0.0. You can find the release notes here: https://cloud.google.com/dataform/docs/release-notes
This note was about releasing the v3 version of @dataform/core, you can still use 2.x or 3.x versions depending on what you set in your Dataform project.
Support of managed incremental schema updates in GCP Dataform requires changes to @dataform/core (already released) and changes in the GCP Dataform execution engine (hasn't been officially released yet, WIP).
Thank you for clarifying this. How can I know which version I can use in my dataform project? For example, I used to have it as 3.0.7 and changed it to 3.0.23 to test the support of incremental schema and the workflow worked normally but without this feature. It didn't say this release is not supported for example.
Thank you for clarifying this. How can I know which version I can use in my dataform project? For example, I used to have it as 3.0.7 and changed it to 3.0.23 to test the support of incremental schema and the workflow worked normally but without this feature. It didn't say this release is not supported for example.
After we officially launch support in our GCP execution engine, you'll be able to use it with existing @dataform/core versions where its configuration is supported.
I think it's a good call to throw an error for now until it's supported in our API.
@Tuseeq1, can you add returning an error in the Dataform API for now when someone tries to use configuration for incremental schema updates?
I will add the error.
UPD: See #1991
After the discussion in a PR we agreed to try to find a simple solution on the backend side to make it turn it to an error for all the customers regardless of Dataform Core version.