Concurrent DBT jobs are failing on populating elementary tables in Redshift
Describe the bug
We are using DBT cloud to run multiple DBT jobs. Each runs on a 10-minute cadence, uses the same DBT schema, and connects to Redshift DW.
So, there's a high probability that two different jobs are attempting to write into tables in the Elementary schema at the same time. And when that happens, we run into Serializable isolation violation errors, and our DBT jobs are aborted.
After digging into debug logs, I've found this interesting lines:
2023-09-13 12:42:31.851461 (MainThread): 12:42:31 On master: /* {"app": "dbt", "dbt_version": "1.3.5", "profile_name": "user", "target_name": "default", "connection_name": "master"} */
create temporary table
"dbt_tests__tmp_20230913124231846680124231848433"
as (
SELECT
*
FROM "analytics_elementary"."dbt_tests"
WHERE 1 = 0
);
...
2023-09-13 12:42:47.278637 (MainThread): 12:42:47 On master: /* {"app": "dbt", "dbt_version": "1.3.5", "profile_name": "user", "target_name": "default", "connection_name": "master"} */
begin transaction;
delete from "analytics_elementary"."dbt_tests"; -- truncate supported in Redshift transactions, but causes an immediate commit
insert into "analytics_elementary"."dbt_tests" select * from "dbt_tests__tmp_20230913124231846680124231848433";
commit;
2023-09-13 12:42:48.232158 (MainThread): 12:42:48 Postgres adapter: Postgres error: 1023
DETAIL: Serializable isolation violation on table - 88024998, transactions forming the cycle are: 954315228, 954315158 (pid:1073760694)
So, at 12:42:31 UTC, we create a temp table with the content of dbt_tests. Then, 16 seconds later, at 12:42:47 UTC, we "truncate" dbt_tests and insert everything from the temp table. Then, it fails because another DBT job is executing the same DELETE+INSERT commands.
I've confirmed in svv_table_info that the table 88024998 is indeed dbt_tests, and that the two transaction IDs (954315228, 954315158) are referring to the DELETE+INSERT transaction blocks in two different DBT jobs.
To Reproduce
Steps to reproduce the behavior:
- Create a new DBT project with two models, ex:
model_1andmodel_2 - Add Redshift target to the project
- Add Elementary package to the project
- Execute
dbt run --select model_1anddbt run --select model_2at the same time - See an error -
Serializable isolation violation on table
Expected behavior
Concurrent DBT executions with Elementary enabled are working. I can imagine the fix would be:
- Create a new empty temp table
- Insert new records into the temp table
- Insert into the final table (
analytics_elementary.dbt_tests) everything from the temp table
Environment (please complete the following information):
- dbt - 1.3.2
- dbt redshift adapter - 1.3.2
- elementary-data/elementary - 0.10.2
pip freeze:
agate==1.6.3
agate-dbf==0.2.2
agate-excel==0.2.5
agate-sql==0.5.9
attrs==22.2.0
Babel==2.11.0
black==23.7.0
boto3==1.26.46
botocore==1.29.46
certifi==2023.7.22
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
colorama==0.4.5
csvkit==1.1.1
dbfread==2.0.7
dbt-core==1.3.2
dbt-extractor==0.4.1
dbt-postgres==1.3.2
dbt-redshift==1.3.0
et-xmlfile==1.1.0
future==0.18.3
greenlet==2.0.2
hologram==0.0.15
idna==3.4
importlib-metadata==6.0.0
isodate==0.6.1
Jinja2==3.1.2
jmespath==1.0.1
jsonschema==3.2.0
leather==0.3.4
Logbook==1.5.3
MarkupSafe==2.1.1
mashumaro==3.0.4
minimal-snowplow-tracker==0.0.2
msgpack==1.0.4
mypy-extensions==1.0.0
networkx==2.6.3
olefile==0.46
openpyxl==3.1.2
packaging==21.3
parsedatetime==2.4
pathspec==0.9.0
platformdirs==3.10.0
psycopg2-binary==2.9.5
pycparser==2.21
pyparsing==3.0.9
pyrsistent==0.19.3
python-dateutil==2.8.2
python-slugify==7.0.0
pytimeparse==1.1.8
pytz==2022.7
PyYAML==6.0
requests==2.31.0
s3transfer==0.6.0
six==1.16.0
SQLAlchemy==1.4.49
sqlparse==0.4.4
text-unidecode==1.3
tomli==2.0.1
typing_extensions==4.4.0
urllib3==1.26.13
Werkzeug==2.2.3
xlrd==2.0.1
zipp==3.11.0
packages.yml:
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: elementary-data/elementary
version: 0.10.2
This also happens to us using dbt v1.4 and elementary-data==0.10.0 on Redshift. For now, we had to disable elementary because of this. Looking forward to a solution!
Hi all, Closing this issue since it had no activity in the past 3 months, but if it's still relevant for anyone please feel free to re-open.
Thanks, Itamar