[Bug] dbt-postgres transaction management: extraneous `BEGIN` when listing schemas
Is this a new bug in dbt-core?
- [X] I believe this is a new bug in dbt-core
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
"dbt run" on postgres tries to open transactions while another is already in progress. As a result, PostgreSQL displays warnings.
A run on a single model raises 24 warnings.
Expected Behavior
"dbt run" should not try to run neested transactions on PostgreSQL.
Steps To Reproduce
- run a "dbt run" command on one model
- look at PG logs
- you can also set log_statement to all to see the "BEGIN" statements sent to PG
Relevant log output
2024-01-24 12:09:27 2024-01-24 11:09:27.210 UTC [8955] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.224 UTC [8956] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.224 UTC [8957] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.225 UTC [8954] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.268 UTC [8958] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.282 UTC [8961] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.300 UTC [8959] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.300 UTC [8960] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.327 UTC [8963] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.331 UTC [8962] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.351 UTC [8965] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.362 UTC [8964] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.374 UTC [8966] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.393 UTC [8967] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.448 UTC [8970] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.449 UTC [8968] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.449 UTC [8969] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.499 UTC [8971] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.501 UTC [8972] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.502 UTC [8973] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.549 UTC [8974] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.570 UTC [8974] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.654 UTC [8975] WARNING: there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.767 UTC [8976] WARNING: there is already a transaction in progress
Environment
- OS: docker / debian
- Python: 3.11
- dbt: 1.7.4
- dbt-postgres: 1.7.4
Which database adapter are you using with dbt?
postgres
Additional Context
No response
Thanks for reaching out @Michael-cd30 !
I can see why you'd want to avoid extraneous warning messages in your PG logs. However, unless it it causing errors while building data sets in dbt this is unlikely to be a priority for us. Accordingly, I'm going to label this as "help wanted" for a motivated community member to solve.
@Michael-cd30 How many threads do you have your dbt project configured for? If you configured dbt for 24 threads but there's a limit on the db user to 1 thread, I could see this scenario happening.
Hi @alison985
The used profile (targetdev in screenshot) is configured to run 4 threads. And, for the test I've run only one model so I don't think the problem deals with concurrency.
@Michael-cd30 Technically a database user account could be running queries outside of dbt simultaneously. Also, technically, in some places you can limit total concurrent threads on the database regardless of user. However, I assume you already accounted for those.
I also agree 1 model creation, with no dependencies or tests, having 24 warning messages is strange. It would be great to see
I would have expected to start seeing an exponential back-off timestamps for something that's trying to get a thread. Other things that stand out to me as differences from my normal work(and therefore may be related) are:
- You seem to be running Microsoft
- It's odd to me that the dbt-postgres and dbt versions line up exactly. I would have expected to see a difference.
- Some of the milliseconds on the warning timestamps are the same for two rows consecutively.
- You're running a
dbt run --selectto a filename instead of the name of the model
Any chance that model has sub-queries or CTEs in it? Or a loop or macro it's calling?
Though honestly, at this point I'd personally go to the database server logs instead of the dbt logs. Figure out what is happening there and then figure out what dbt must be doing to cause that message.
@alison985 - We first saw the problem on our servers : a dedicated PostgreSQL server + a Docker server from which Apache Airflow runs DBT. Both are Linux.
In order to write this issue, I've tested in a simpler context, directly from my Windows workstation. In this simple case, the PostgreSQL service runs locally, so I'm sure that no other command is running in parallel.
The query uses CTE as recommended, but remains very simple and loop-free.
The dbt-postgres connector is maintained by dbt-labs and the code is in the same repository as the core, which probably explains why the version is identical, see : https://github.com/dbt-labs/dbt-core/tree/main/plugins/postgres/dbt/adapters/postgres
I've also looked at PostgreSQL logs activating log_statement = 'all' in order to see commands received by the server. It seems that DBT starts by analyzing the contents of all schemas. This analysis starts with two "BEGIN" for each schema. One would be enough and, because the analysis is a simple SELECT, it doesn't even seem useful to open transactions.
Here's an extract:
2024-02-01 23:21:11.910 CET [25336] LOG: instruction : BEGIN 2024-02-01 23:21:11.910 CET [25336] LOG: instruction : BEGIN 2024-02-01 23:21:11.910 CET [25336] ATTENTION: une transaction est déjà en cours 2024-02-01 23:21:11.912 CET [25336] LOG: instruction : /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "taradata", "target_name": "localdev", "connection_name": "list_taradata_wrh_administratif"} */ select 'taradata' as database, tablename as name, schemaname as schema, 'table' as type from pg_tables where schemaname ilike 'wrh_administratif' union all select 'taradata' as database, viewname as name, schemaname as schema, 'view' as type from pg_views where schemaname ilike 'wrh_administratif' union all select 'taradata' as database, matviewname as name, schemaname as schema, 'materialized_view' as type from pg_matviews where schemaname ilike 'wrh_administratif'
2024-02-01 23:21:11.917 CET [9672] LOG: instruction : BEGIN 2024-02-01 23:21:11.917 CET [9672] LOG: instruction : BEGIN 2024-02-01 23:21:11.917 CET [9672] ATTENTION: une transaction est déjà en cours 2024-02-01 23:21:11.918 CET [23792] LOG: instruction : BEGIN 2024-02-01 23:21:11.919 CET [23792] LOG: instruction : BEGIN 2024-02-01 23:21:11.919 CET [23792] ATTENTION: une transaction est déjà en cours 2024-02-01 23:21:11.919 CET [9672] LOG: instruction : /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "taradata", "target_name": "localdev", "connection_name": "list_taradata_stg_data_laregion_fr"} */ select 'taradata' as database, tablename as name, schemaname as schema, 'table' as type from pg_tables where schemaname ilike 'stg_data_laregion_fr' union all select 'taradata' as database, viewname as name, schemaname as schema, 'view' as type from pg_views where schemaname ilike 'stg_data_laregion_fr' union all select 'taradata' as database, matviewname as name, schemaname as schema, 'materialized_view' as type from pg_matviews where schemaname ilike 'stg_data_laregion_fr'
2024-02-01 23:21:11.919 CET [22488] LOG: instruction : BEGIN 2024-02-01 23:21:11.919 CET [22488] LOG: instruction : BEGIN
and so on...
I haven't tested but I'm quite sure the more "schema" my dbt-project will have, the more warning I'll receive.
Regards,
We have a similar problem in greenplum during dbt execution. WARNING ERROR_TEXT: there is already a transaction in progress. Example dbt execute: dbt build --models folder.chain_folder.*
Thanks for that info @Michael-cd30 and @manticoreroko !
We're unable to be prioritize this ourselves right now, but we'd be willing to give a look at a PR submitted by a community member.
Here's a couple places to start looking:
- https://github.com/dbt-labs/dbt-adapters/blob/154ca610f67906b0affea9a1e85d29126721cafd/dbt/adapters/sql/connections.py#L159-L160
- https://github.com/dbt-labs/dbt-adapters/blob/154ca610f67906b0affea9a1e85d29126721cafd/dbt/adapters/sql/connections.py#L65-L66
Theoretically, the code is already supposed to check if there's already a known transaction that is open or not, but it must not be working somehow.
I'd suggest setting threads: 1 within profiles.yml just to be sure multiple threads aren't somehow related.
Same with 1 thread.
@Michael-cd30 were you able to find a fix for this?