aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

LOCK parameter in to_sql/copy

Open aeeladawy opened this issue 3 years ago • 3 comments

Currently using wr.redshift.to_sql/copy() to store data to redshift in an airflow pipeline, where multiple dags write to the same table.

I run into this serializable isolation error: Serializable isolation violation on table

Using the lock=True parameter: the first dag instance gets stuck at the store data to redshift step, and the rest get held up until they all fail.

Any clue on handling concurrent transactions to redshift using wr?

aeeladawy avatar Jul 07 '22 11:07 aeeladawy

Just to help narrow down the problem - what to_sql/copy mode are you using? Also, is lock=True for both? Would help if you could share the code

kukushking avatar Jul 15 '22 15:07 kukushking

I used lock=True with both to_sql() and copy(), and mode="upsert"

The code was:

wr.redshift.copy(
             df=data,
             dtype=cast_dict,
             con=conn,
             schema=SCHEMA,
             table=TABLE,
             mode="upsert",
             lock=True
             primary_keys=PRIMARY_KEYS,
             path=PATH
             index=False,
         )

aeeladawy avatar Jul 16 '22 00:07 aeeladawy

@aeeladawy could you share stack trace as well please?

kukushking avatar Jul 18 '22 09:07 kukushking

Closing due to inactivity

kukushking avatar Nov 28 '22 13:11 kukushking