coaster icon indicating copy to clipboard operation
coaster copied to clipboard

failsafe_add should use UPSERT where available

Open jace opened this issue 8 years ago • 4 comments

Nested transactions are nice, but UPSERT is more reliable. failsafe_add should use it when it detects the underlying engine supports it.

SQLAlchemy supports custom compilation for standard statements like INSERT (docs). We could have tables specifically flag support for upserts in __table_args__ like this:

class MyTable(db.Model):
    __tablename__ = 'my_table'
    __table_args__ = ({'upsert': None},)

This should generate an ON CONFLICT DO NOTHING clause. The alternative ON CONFLICT DO UPDATE is trickier as it requires specifying specific columns to update. In this case the syntax is:

class MyTable(db.Model):
    __tablename__ = 'my_table'
    __table_args__ = ({'upsert': ['col1', 'col2', …]},)

Some fields like id and created_at should not be updated in case of a conflict, while most other fields need to be. The table should whitelist (or blacklist?) fields to update.

Finally, this approach to upsert will result in the parameter to failsafe_add no longer guaranteed to be the same entity persisted to disk. session.merge can't be used either because the primary key is not guaranteed to match, especially with UUID primary keys that are currently generated client-side in Coaster. Since PostgreSQL's ON CONFLICT clause does not tell us whether a conflict occurred, failsafe_add has no option but to attempt reloading each time, even if the object is no longer needed. In this case the caller should avoid passing filters to failsafe_add so it skips the reload (and failsafe_add should support optional filters).

jace avatar Apr 05 '16 09:04 jace

Proposed syntax change for the upsert key in __table_args__:

  • None: No upsert at all
  • []: ON CONFLICT DO NOTHING
  • ['col1', 'col2', …]: ON CONFLICT DO UPDATE SET col1 = :col1, col2 = :col2, …

jace avatar Apr 05 '16 10:04 jace

Problem: PostgreSQL's docs say ON CONFLICT DO UPDATE requires a mandatory conflict target (optional for DO NOTHING), so the SQL syntax will be like one of the following:

  1. ON CONFLICT (col_name) DO UPDATE …
  2. ON CONFLICT ON CONSTRAINT constraint_name DO UPDATE …

This means we can't simply specify {'upsert': ['col1', 'col2', …]} in __table_args__. We also need somewhere to specify the conflict target in either of its forms.

jace avatar Apr 05 '16 10:04 jace

It is also possible to use an INSERT ... WHERE NOT EXISTS clause that works on PostgreSQL < 9.5.

  1. http://stackoverflow.com/a/36377530/78903
  2. http://stackoverflow.com/a/17991647/78903

This is not as reliable as an upsert, but is still better than making distinct roundtrips to the database, and could still be protected inside a nested transaction.

jace avatar Apr 07 '16 08:04 jace

SQLAlchemy has support for upserts now but only with insert statements, not at the declarative level.

jace avatar Jun 07 '17 05:06 jace