[ADAP-671] Statements executed one by one instead of as a whole block
Is this a regression in a recent version of dbt-redshift?
- [X] I believe this is a regression in dbt-redshift functionality
- [X] I have searched the existing issues, and I could not find an existing issue for this regression
Current Behavior
We have a macro in dbt that runs a set of DDL statements on Redshift. We concatenate those statements in a string (cmd) and run with "{% do run_query(cmd) %}". That string looks like below;
begin;
ddl_statement_1;
ddl_statement_2;
ddl_statement_3;
...
...
ddl_statement_x;
end;
After moving to dbt 1.5, we see that those statements are executed one by one, instead of sending the statement as a whole to database. We can see it both in database logs and dbt logs.
Impact
In 1.5 and onwards, these statements take more than 5 minutes to execute. Previously in versions 1.4 and below, they would execute in only a few seconds.
Expected/Previous Behavior
The below statements are executed at once in a single transaction.
begin;
ddl_statement_1;
ddl_statement_2;
ddl_statement_3;
...
...
ddl_statement_x;
end;
Steps To Reproduce
- Install dbt-core 1.5
- Install dot-redshift 1.5.6
- Run a set of ddl statements between begin and end in a macro with
"{% do run_query(cmd) %}"
Relevant log output
[0m14:34:35.967673 [debug] [MainThread]: On macro_xyz: drop view if exists db_name.schema_name.object_1;
[0m14:34:36.158399 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.158975 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.159297 [debug] [MainThread]: On macro_xyz: create view db_name.schema_name.object_1 as select * from db_name.another_schema_name.object_1 with no schema binding;
[0m14:34:36.528864 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.531540 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.532585 [debug] [MainThread]: On macro_xyz: grant select on table db_name.schema_name.object_1 to group db_group;
[0m14:34:36.729741 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.730303 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.730638 [debug] [MainThread]: On macro_xyz: drop view if exists db_name.schema_name.object_2;
[0m14:34:36.923462 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.926900 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.927759 [debug] [MainThread]: On macro_xyz: create view db_name.schema_name.object_2 as select * from db_name.another_schema_name.object_2 with no schema binding;
[0m14:34:37.306524 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:37.307047 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:37.307360 [debug] [MainThread]: On macro_xyz: grant select on table db_name.schema_name.object_2 to group db_group;
[0m14:34:37.514141 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
Environment
- OS: MacOS 13.4 and Ubuntu
- Python: 3.9.16
- dbt-core (working version): 1.4.6
- dbt-redshift (working version): 1.4.6
- dbt-core (regression version): 1.5.0, 1.5.1, 1.5.2
- dbt-redshift (regression version): 1.5.1, 1.5.2, 1.5.6
Additional Context
No response
We noticed the same with our post-hook, what helped us to set autocommit: False in profiles.yml (more details here)
@naumovdalex sorry to hear your existing workflows is failing. As part of the migration to using the redshift_connector connector library in 1.5 instead of psycopg2, this adapter's approach to transactions have changed such that every query is executed at once.
Check out #463 for more context, and I'd love to see if you get the the intended behavior by doing as @jan-benisek says and adding autocommit: False to your profiles.yml.
I suspect that you may still encounter issues due to design decisions of the underlying connector library, for example see https://github.com/aws/amazon-redshift-python-driver/issues/162.
@naumovdalex I'm going to close this issue for now. If possible, I think this would make a great reproducible use case to rationalize https://github.com/aws/amazon-redshift-python-driver/issues/162. Perhaps open an issue there detailing the limitation?
We noticed the same with our post-hook, what helped us to set
autocommit: Falseinprofiles.yml(more details here)
I tried this, unfortunately that didn't help as well. Also, as part of our team uses dbt Cloud IDE as well, this would be a bit more complicated for us
@naumovdalex sorry to hear your existing workflows is failing. As part of the migration to using the
redshift_connectorconnector library in1.5instead ofpsycopg2, this adapter's approach to transactions have changed such that every query is executed at once.Check out #463 for more context, and I'd love to see if you get the the intended behavior by doing as @jan-benisek says and adding
autocommit: Falseto yourprofiles.yml.I suspect that you may still encounter issues due to design decisions of the underlying connector library, for example see aws/amazon-redshift-python-driver#162.
@dataders it's not that the workflows fail, but some macros are taking more than 5 minutes to execute, instead of just a few seconds
I understand now -- that not an ideal experience. @naumovdalex can you please provide a reproducible example? This would help the engineer at Redshift work on the above linked issue on their connector library
@naumovdalex are you still experiencing this issue in the new versions of dbt-redshift?