django-import-export Fix txn commit hooks executed on dry run

Describe the bug TLDR; When import export is used it doesn’t remove commit hooks on dry_run. We use Django 2.2 with Postgres (latest). We use transaction commit hooks to add celery jobs. When It is used post_save signal doesn’t add background job twice.

https://docs.djangoproject.com/en/3.1/topics/db/transactions/#django.db.transaction.on_commit https://docs.djangoproject.com/en/3.1/topics/db/transactions/#savepoints https://github.com/django-import-export/django-import-export/issues/1078

To Reproduce Steps to reproduce the behavior:

Register post_save signal for a model

@receiver([post_save, post_delete], sender=ContactCohortMap)
@disable_for_loaddata
def contact_cohort_map_signal(sender, instance, **kwargs):
    if instance:
        transaction.on_commit(lambda: some_celery_task.delay('arg1'))

Add background job in celery or any function using commit hooks

transaction.on_commit(lambda: some_celery_task.delay('arg1'))

Import some valid data
Background job will be added in celery twice - one when it dry runs other is successful commit

Versions

Django Import Export: 1.2.0
Python 3.7
Django 2.2.5

Expected behavior Execute commit hooks on successful db commit.

Screenshots Can show if requested.

Additional context When import export used with transaction support. It creates a savepoint then execute each row in txn.atomic() block.

    def import_data_inner(self, dataset, dry_run, raise_errors, using_transactions, collect_failed_rows, **kwargs):
        result = self.get_result_class()()
        result.diff_headers = self.get_diff_headers()
        result.total_rows = len(dataset)

        if using_transactions:
            # when transactions are used we want to create/update/delete object
            # as transaction will be rolled back if dry_run is set
            sp1 = savepoint()
       . . .
        for i, row in enumerate(dataset.dict, 1):
            with atomic_if_using_transaction(using_transactions):
                row_result = self.import_row(
                    row,
                    instance_loader,
                    using_transactions=using_transactions,
                    dry_run=dry_run,
                    **kwargs
                )
      . . .
        if using_transactions:
            if dry_run or result.has_errors():
                savepoint_rollback(sp1)
            else:
                savepoint_commit(sp1)

When you execute db query in nested atomic blocks it adds commit hook function in child or intermediate savepoints.

    def savepoint_rollback(self, sid):
        """
        Roll back to a savepoint. Do nothing if savepoints are not supported.
        """
        if not self._savepoint_allowed():
            return

        self.validate_thread_sharing()
        self._savepoint_rollback(sid)

        # Remove any callbacks registered while this savepoint was active.
        self.run_on_commit = [
            (sids, func) for (sids, func) in self.run_on_commit if sid not in sids
        ]

So when dry run was done it didn’t remove callback for intermediate save points

Fix without updating library

class BaseModelResource(resources.ModelResource):
    save_points = []

    def before_save_instance(self, instance, using_transactions, dry_run):
        """
        Override to add additional logic. Does nothing by default.
        Saving intermediate savepoints of txn
        """
        if using_transactions and dry_run:
            con = transaction.get_connection()
            self.save_points.extend(con.savepoint_ids)

    def after_import(self, dataset, result, using_transactions, dry_run, **kwargs):
        """
        Override to add additional logic. Does nothing by default.
        Manually removing commit hooks for intermediate savepoints of atomic transaction
        """
        if using_transactions and dry_run:
            con = transaction.get_connection()
            for sid in self.save_points:
                con.run_on_commit = [
                    (sids, func) for (sids, func) in con.run_on_commit if sid not in sids
                ]
        super().after_import(dataset, result, using_transactions, dry_run, **kwargs)

Inherit this class to avoid this problem.

Solution Saved all intermediate savepoints and removed commit hooks from them

Acceptance Criteria Will add later

Edit : Closed PR : https://github.com/django-import-export/django-import-export/pull/1176 Because it was created from master branch

Aug 20 '20 19:08 pratikgajjar

Coverage increased (+0.005%) to 96.759% when pulling ec59cab44221f940bf40f2c21cb7605ed86de1af on pratikgajjar:bugfix/txn-commit-hook-dry-run into cd6ca0bb574a3a29d2ca18f965cba8898bce4a79 on django-import-export:master.

Aug 20 '20 19:08 coveralls

Thanks for the changes @pratikgajjar ! I see we are starting to play around with the connections in order not to trigger the signals. Does this work? Can we add some tests to confirm that behaviour?

Additionally, has it been considered to disconnect the signal temporarily instead of messing around with the connections? Seems like we might be going at this at a lower level than necessary. I used this, this past week and it worked like a charm.

Aug 21 '20 09:08 andrewgy8

Thanks for the changes @pratikgajjar ! I see we are starting to play around with the connections in order not to trigger the signals. Does this work? Can we add some tests to confirm that behaviour?

Additionally, has it been considered to disconnect the signal temporarily instead of messing around with the connections? Seems like we might be going at this at a lower level than necessary. I used this, this past week and it worked like a charm.

Signal.disconnect(receiver=None, sender=None, dispatch_uid=None) On None argument does it disconnect all models ? Else How it will know which signal to disconnect ? What if signal function has some code that will raise errors ? It will bypass post save on dry run but when committed to db it will raise errors.

Edit : We are not playing around with connection to not to trigger signal, It will trigger signal twice but if someone wanted to run something on successful commit only that won’t be triggered on dry run

Aug 21 '20 11:08 pratikgajjar

Right, I was suggesting perhaps something like post_save.disconnect() could be done on a dry_run and then followed by a post_save.connect(). Not sure if that would solve the issue.

Aug 21 '20 11:08 andrewgy8

Yes, I agree. I think this should categorised as bug in django as well. Because it should have handled that part when we rollback sp1 it should have removed other commit hooks also

Aug 21 '20 12:08 pratikgajjar

I don't think this is a good approach. Fiddling with run_on_commit is not documented behaviour inside Django. The only reliable way to prevent these on_commit hooks from firing is to always use an @atomic and roll it back. As I understand @atomic is only used if the user sets the setting/class attribute to true, which is imo a bad default.

Oct 19 '20 12:10 adamchainz

@adamchainz Thanks for your feedback. Yes, you are right this all happening because library uses manual save point tracking to do rollback on dry run when using transactions. So instead of using savepoint_rollback we should use rollback. This is what you are suggesting right ?

    # file : import_export/resources.py 
    # line no : 588
    def import_data_inner(self, dataset, dry_run, raise_errors, using_transactions, collect_failed_rows, **kwargs):
        . . .
        if using_transactions:
            # when transactions are used we want to create/update/delete object
            # as transaction will be rolled back if dry_run is set
            sp1 = savepoint()

Edit: If you can suggest better way to remove child savepoints that can solve problem as well.

Oct 21 '20 13:10 pratikgajjar

I think if you use atomic(), you should be able to use set_rollback to tell Django to roll back, and then when you exit the atomic it will do so. Afraid I can't give much more guidance on this right now, so my comments are a bit driveby.

Oct 21 '20 17:10 adamchainz

Thanks for the contrib. Im going to convert this to a draft since it is lacking test coverage and documentation.

Dec 11 '20 08:12 andrewgy8

@pratikgajjar Your proposed solution of inheriting from your BaseModelResource is not working for me. My signal still runs twice. Is it required to have the receiver using transaction.on_commit as you show on your case? I am not using celery tasks.

Anyone else tried to test this solution and got it working?

Mar 23 '21 16:03 cachitas

@pratikgajjar Your proposed solution of inheriting from your BaseModelResource is not working for me. My signal still runs twice. Is it required to have the receiver using transaction.on_commit as you show on your case? I am not using celery tasks.

Anyone else tried to test this solution and got it working?

@cachitas
Yes you need to use transaction.on_commit over your function, Celery task was just an example.

transaction.on_commit(lambda: your_function(args))

Mar 24 '21 13:03 pratikgajjar

Thanks @pratikgajjar, that did the trick. I have been playing around with the transaction but I am not comfortable enough with it. Also, from @adamchainz comment it looks like we may need to rethink the current implementation.

For now I decided to go with @andrewgy8 suggestion of disconnecting the signal temporarily. It does not feel as robust, but it fits my needs.

Fun Fact: while working on this, I found a bug in Signal.disconnect() PR

Mar 25 '21 19:03 cachitas

Thanks @pratikgajjar, that did the trick. I have been playing around with the transaction but I am not comfortable enough with it. Also, from @adamchainz comment it looks like we may need to rethink the current implementation.

For now I decided to go with @andrewgy8 suggestion of disconnecting the signal temporarily. It does not feel as robust, but it fits my needs.

Funny Fact: while working on this, I found a bug in Signal.disconnect() PR

When you disconnect a signal does it happen for only that execution or Like in parallel some other (thread) code is modifying the model then it also gets disconnected.

You can test that with below flow

Create one api endpoint which modifies your model
Add sleep after disconnecting signal
From django admin modify this model.
Hit api endpoint (3. Takes row level lock so i don't think you will face any db lock related issue)

Ideally it should trigger signal.

Test it with python manage.py runserver (single threaded so it will block 4th step) and with gunicorn or uwsgi.

Mar 25 '21 20:03 pratikgajjar

Another way to solve this problem is not to use multiple savepoints for dry run with db txn. It requires good understanding of this library. Not sure what else it will break.

I have gone through step by step each line of code django executes to fix this problem. The method I have used is also used by django internally to rollback a txn.

We are using this fix in production for good amount of time and we haven't faced any issues.

Mar 25 '21 20:03 pratikgajjar

When you disconnect a signal does it happen for only that execution or Like in parallel some other (thread) code is modifying the model then it also gets disconnected.

Indeed with a sleep() after disconnecting, modifying the model from other sources did not trigger the signal using runserver. Using gunicorn with 4 workers the signal triggered correctly during sleep().

Mar 26 '21 17:03 cachitas

When you disconnect a signal does it happen for only that execution or Like in parallel some other (thread) code is modifying the model then it also gets disconnected.

Indeed with a sleep() after disconnecting, modifying the model from other sources did not trigger the signal using runserver. Using gunicorn with 4 workers the signal triggered correctly during sleep().

So exectuion depends on how you run your code. It means within worker process it will disable the signal. If same worker had served that requests then it also wouldn't have triggered signal.

Thank you for testing it.

Mar 27 '21 07:03 pratikgajjar

Hi Pratik - this has been around for a while. Do you want us to resurrect it for consideration for inclusion in the library? If so, we'll have to have a discussion about whether it is still required. We should do this before embarking on any major work.

Apr 04 '23 15:04 matthewhegarty

Closing due to inactivity

Apr 12 '23 12:04 matthewhegarty

django-import-export django-import-export copied to clipboard

Fix txn commit hooks executed on dry run

django-import-export
django-import-export copied to clipboard