connector-magento Concurrent imports leading to unique key errors / double import

Introduction

Despite a recent improvement with advisory locks to prevent concurrent import of the same records, we still see some errors due to timing issues.

The issue

The effect is that one of the job fails with a UNIQUE key error because 2 bindings are created with the same magento_id. Worst, if we don't have a UNIQUE key, we'll end with 2 identical records.

Here is a python-ish pseudo-code where this issue happens (this is not at all the actual magentoerpconnect code but a huge simplification to show the flow):


def import_partner_record(model, magento_id):
    # a binder returns the Odoo ID for a Magento ID
    binding = binder_for(model).to_openerp(magento_id)
    magento_data = backend_adapter_for(model).read(magento_id)

    # import dependency : partner category
    import_partner_category_record('magento.res.partner.category', magento_data['group_id'])

    values = mapper_for(model).map_record().values()
    if binding:
        binding.write(values)
    else:
        model.create(values)

def import_partner_category_record(model, magento_id):
    # a binder returns the Odoo ID for a Magento ID
    binding = binder_for(model).to_openerp(magento_id)
    magento_data = backend_adapter_for(model).read(magento_id)

    values = mapper_for(model).map_record().values()
    if binding:
        binding.write(values)
    else:
        model.create(values)

# We import 2 different partner records, both depends on a 'magento.res.partner.category' with ID 10
# the partners and their categories do not exist yet
import_partner_record.delay('magento.res.partner', 1)
import_partner_record.delay('magento.res.partner', 2)

As shown above, when a job runs to import a partner, if one of its dependency does not exist (here the category), it will run a sub-import to get it (and it's recursive over the dependencies of the dependencies).

2 concurrent jobs may import the same record at the same time, it happens very rarely on the main record (here the partner) but more often on dependencies (here the partner category).

Given T1 and T2 the transactions for our 2 jobs, when T2 starts, it can't see that T1 is already importing the same record.

The mecanism implemented in #163 improve that by forcing each job to acquire an advisory lock for the record it imports. In the diagram below, each block represents a lock. We can see that when T2 wants to start the import of Category X, it cannot acquire the lock. In such situation, the job raises a RetryableJobError and is retried later.

T1 and T2 are the transactions
c is the commit of the transaction, r is the rollback of the transaction
---Time--->
> T1 /------------------------\ c
> T1 | Import Partner A       | c
> T1 \------------------------/ c
> T1        /-----------------\ c
> T1        | Imp. Category X | c
> T1        \-----------------/ c
      > T2 /------------------------\ r
      > T2 | Import Partner B       | r
      > T2 \------------------------/ r
      > T2        !!!!!!!!!!!!!!!!!!! r
      > T2        ! Imp. Category X ! r
      > T2        !!!!!!!!!!!!!!!!!!! r

However and unfortunately, it doesn't cover all the possible timing issues as the following diagram depicts:

T1 and T2 are the transactions
c is the commit of the transaction, r is the rollback of the transaction
---Time--->
> T1 /------------------------\ c
> T1 | Import Partner A       | c
> T1 \------------------------/ c
> T1        /-----------------\ c
> T1        | Imp. Category X | c
> T1        \-----------------/ c
                      > T2 /------------------------\ c
                      > T2 | Import Partner B       | c
                      > T2 \------------------------/ c
                      > T2        /-----------------\ c
                      > T2        | Imp. Category X | c
                      > T2        \-----------------/ c

Given that Odoo uses a repeatable read isolation level (snapshot made at the beginning of transaction), And that T2 starts before that T1 is committed Then T2 is not aware that T1 already created Category X. Given that T1 is committed before the import of Category X is started in T2 Then there is no advisory lock to prevent T2 to import Category X again

Ideas of solutions:

Change the isolation level of the transaction to Serializable level: at the commit of T2, it would see that the SELECT searching for the binding does not return the same result and would rollback (retry later)
Change the isolation level of the transaction to Read committed: when we call binding = binder_for(model).to_openerp(magento_id), the SELECT will be able to see that T1 created the category (snapshot made at each SELECT)
Open a new fresh transaction (so it has a new snapshot) at the beginning of the import just to see if we find the binding, if so we raise a RetryableJobError to retry later

I guess the option 1. would raise a hell lot of serialization errors as repeatable read suffers already from a lot of them.

I implemented the solution 3. for another project where I have the same issue, which looks like:


class Importer(connector.Importer):

    @contextmanager
    def do_in_new_connector_env(self, model_name=None):
        """ Context manager that yields a new connector environment
        Using a new Odoo Environment thus a new PG transaction.
        This can be used to make a preemptive check in a new transaction,
        for instance to see if another transaction already made the work.
        """
        with openerp.api.Environment.manage():
            registry = openerp.modules.registry.RegistryManager.get(
                self.env.cr.dbname
            )
            with closing(registry.cursor()) as cr:
                try:
                    new_env = openerp.api.Environment(cr, self.env.uid,
                                                      self.env.context)
                    new_connector_session = ConnectorSession.from_env(new_env)

                    connector_env = self.connector_env.create_environment(
                        self.backend_record.with_env(new_env),
                        new_connector_session,
                        model_name or self.model._name,
                        connector_env=self.connector_env
                    )    

                    yield connector_env
                except:
                    cr.rollback()
                    raise
                else:
                    cr.commit()

    def _run(self, record):
        self.external_id = self.external_id_from_record(record)
        lock_name = 'import({}, {}, {}, {})'.format(
            self.backend_record._name,
            self.backend_record.id,
            self.model._name,
            self.external_id,
        )
        # Keep a lock on this import until the transaction is committed
        self.advisory_lock_or_retry(lock_name,
                                    retry_seconds=RETRY_ON_ADVISORY_LOCK)

        self.external_record = record

        binding = self._get_binding()
        if not binding:
            with self.do_in_new_connector_env() as new_connector_env:
                binder = new_connector_env.get_connector_unit(Binder)
                if binder.to_openerp(self.external_id):
                    raise RetryableJobError(
                        'Concurrent error. The job will be retried later',
                        seconds=RETRY_WHEN_CONCURRENT_DETECTED,
                        ignore_retry=True
                    )

        # continue the import...

It works as expected, but I am a bit bothered to have to open a new transaction for that, for the overhead as much as for the lack of clearness of the operation.

Remains the option 2. that I have some difficulties to evaluate, what are the risks?

Do you have other ideas? Which one do you prefer?

Nov 05 '15 12:11 guewen

I like the option 2 because it's the more efficient, and I don't think there's any side effect.

Nov 05 '15 15:11 pedrobaeza

I'm not a specialist but as far as I know it's not possible to change the isolation level of the transaction since the isolation level is set on the connection. If I'm not wrong, the option 2 is therefore not possible without affecting the others processes.

Nov 06 '15 06:11 lmignon

@lmignon the isolation level is set on the current transaction using the command SET TRANSACTION

In Odoo you can use Read committed by passing the argument serialized=False when creating a cursor. I tried to use serialized=False in the creation of the cursor and it seems to work well. It would be possible to add a feature in the connector to choose the isolation level for a delayed job.

@pedrobaeza is it based on an intuition or facts? By definition this isolation level has less ACID guarantees, so lowering it should be based on a thorough reasoning. Even though based on intuition I tend to think like you this should not be an issue here, I'm looking for possible problematic scenarios or for a reasoning that shows that's safe.

Nov 06 '15 08:11 guewen

Only intuition, because it doesn't include retries.

Nov 06 '15 08:11 pedrobaeza

@guewen Thank you for the explanations.

Nov 06 '15 08:11 lmignon

And also intuition about the side effects (I read it very fast), for the same reasoning for the reduction of the coverage of the isolation level.

Nov 06 '15 08:11 pedrobaeza

@pedrobaeza I have no doubt that the solution + advisory lock would work fine to prevent the double import described in the issue. My only concern is about the data integrity guarantees if we use Read committed which is a weak isolation level.

Explanations of its weaknesses http://sqlperformance.com/2014/04/t-sql-queries/the-read-committed-isolation-level

But perhaps these weaknesses are not a problem for importing records from Magento, that's what I am trying to figure out.

Nov 06 '15 08:11 guewen

Couldn't 'recursive' sub-imports (especially ones on small fast imports like this) just use explicit locking? LOCK TABLE mage_partner_categories IN ACCESS EXCLUSIVE MODE;

T2 wouldn't even get to read from the category table until T1 commits or rolls back. And when it does, it will read the category -as if- it were there all along.

Nov 06 '15 22:11 jaredkipe

Couldn't 'recursive' sub-imports (especially ones on small fast imports like this) just use explicit locking? LOCK TABLE mage_partner_categories IN ACCESS EXCLUSIVE MODE;

Thanks for the proposition. As we also have those sub-imports on large tables like magento.res.partner or magento.product.product, it would be a pity to have to lock the full table. Anyway, it would not bring anything better than what the advisory locking already does (but the advisory lock is more fine-grained): when T1 commits or rollbacks, the table lock is released, but T2 doesn't see the new record as we are in read committed isolation mode.

Nov 09 '15 12:11 guewen

connector-magento connector-magento copied to clipboard

Concurrent imports leading to unique key errors / double import

Introduction

The issue

Ideas of solutions:

connector-magento
connector-magento copied to clipboard