kinto icon indicating copy to clipboard operation
kinto copied to clipboard

TOCTOU issues at resource level

Open glasserc opened this issue 7 years ago • 10 comments

Per https://github.com/Kinto/kinto/pull/1406#issuecomment-347924141, there are TOCTOU issues in the resource level methods that wrap the storage backend. Most of these methods have code that looks something like:

    def delete(self):
        # ... 
        record = self._get_record_or_404(self.record_id)
        self._raise_412_if_modified(record)
        # ...

        deleted = self.model.delete_record(record, last_modified=last_modified)

The problem here is that another transaction could, in the time between the get and the delete, update the record in ways that would cause a race condition. It could update the last_modified of the time, thereby causing the 412 check to be invalidated; it could delete the record, causing the _or_404 check to be invalidated.

The affected methods are collection_post, put, delete, and patch.

I don't see any obvious solution to this problem, but I see a bunch of relatively complicated ones:

  • Cause the storage backend to lock the record when we do the get. This would require breaking compatibility -- we'd have to add a new "lock" argument to the storage backends -- and it would only let us solve for the case of "match any" or "match this version", but not "match none".
  • Cause the storage backend to return the "old" version of the record as well as the new one. This would allow us to keep the 412 and 404 checks in the resource class, and rely on transaction rollbacks to take care of violations. Unfortunately, this would also be a breaking change, and would disqualify any backend that isn't already transactional.
  • Push the 412 and 404 constraints into the storage layer. This would still be a breaking change but any storage backend that can do an atomic compare-and-set would still be OK. (Any storage backend that supports running in a single-threaded mode would qualify.) Pushing 404 is easy -- most storage backends already support the RecordNotFoundError -- but pushing 412s requires some work and some fancy SQL footwork.

glasserc avatar Nov 29 '17 19:11 glasserc

It's not an issue to have backend breaking changes, we already had a number of them. Also we already have features that needs transactions and work in a degraded mode for memory and redis backends (that's why we moved redis out of the scope of kinto because we didn't had time to add transaction support for kinto)

@ronhanson is having a usecase that might need this exact feature. findAndUpdate, maybe he can have insight about which solution he would take.

Natim avatar Apr 17 '18 16:04 Natim

pushing 412s requires some work and some fancy SQL footwork.

Can we add a WHERE last_modified = 'previous_last_modified' clause for the update or delete part and raise a 412 if no records where updated/deleted?

Natim avatar Apr 17 '18 16:04 Natim

A PUT can be a create or an update. There's no such thing as INSERT ... WHERE last_modified = 'previous_last_modified'.

glasserc avatar Apr 18 '18 13:04 glasserc

I thought we were trying to do the insert and in case of conflict handling the update?

Natim avatar Apr 18 '18 13:04 Natim

I think that might be the right approach. Be sure to handle the case of If-Match, which has to fail if there is no record with which to conflict.

glasserc avatar Apr 18 '18 14:04 glasserc

Pardon my kinto-juniorness but...

Feels like this issue is crossing paths with https://github.com/Kinto/kinto/pull/1598 which is about doing inserts.

It feels like the bestest way would be to use row-level locking. And for inserts, rely on idx_records_parent_id_collection_id_last_modified.

The TOCTOU problem happens because we "disconnect" between the client and the database whilst we figure out whether to proceed with the delete/update or whether to raise an error which results in a 419(?) HTTP error.

By doing something like:

SELECT last_modified FROM records
WHERE id=:object_id and parent_id=:parent_id and collection_id=:collection_id
FOR UPDATE NOWAIT

...first, we can be totally confident how to proceed. If the record existed the row-level locking will protect us from a single process getting access to the UPDATE/DELETE to come.

If it didn't exist and it's an insert/upsert, you can't lock the whole table or anything like that but you are also blocked from doing two inserts with the same id|parent_id|collection_id combo. E.g.

INSERT INTO records (id, parent_id, collection_id, data, last_modified) VALUES
(:object_id, :parent_id, :collection_id, (:data)::JSONB, :last_modified)

and, on the Python level you watch out for an IntegrityError. E.g.

try:
    cursor.execute('INSERT INTO ...')
except IntegrityError as exception:
    if 'records_pkey' in exception.args[0]:
        raise TooLateToInsertError('419 for you!')
    raise

Perhaps adding proper Postgresql locking is already something we're planning or perhaps there are circumstances I don't (yet) really understand.

On the Storage interface, it appears to be create, update and delete only. Right? So we can write one implementation of the upsert-with-locking for each. I'd be happy to volunteer where I can.

peterbe avatar Apr 18 '18 18:04 peterbe

By the way, the delete case is solvable without a lock using @Natim 's idea of either using WHERE last_modified = 'previous_last_modified' or something like this:

cur.execute("""
    DELETE FROM records WHERE
    id=:object_id and parent_id=:object_id and  collection_id=:collection_id
    and last_modified=from_epoch(:last_modified)
    RETURNING id
""", {...})
fetched = cur.fetchone()
if not fetched:
    raise TooLateToDeleteError('419 for you!')

peterbe avatar Apr 18 '18 18:04 peterbe

I have prototypes for insert, upsert, update and delete all as python scripts that are pure psycopg2 based. I use a threadpool and some time.sleep() to trigger the race conditions and in all cases I can "prove" the prototypes. Is this interesting?

In all my prototypes I have not yet worried myself with the complexity of tombstones but I'm sure we can factor it in.

The only thing that is a bit stinky is that the interface between Python and Postgres is that you have to rely on the driver to wrap Postgres errors into exception which you then capture and analyze. E.g. this.

peterbe avatar Apr 18 '18 18:04 peterbe

I would be curious to see your scripts.

glasserc avatar Apr 19 '18 13:04 glasserc

I'm trying to collect my thoughts. https://github.com/Kinto/kinto/issues/1525 is all about the create() method. This issue has escalated into a discussion about leveraging proper row-level locks.

Hopefully we can break this up. This issue can be about using row-level locks to do the logical delete. And that other issue can be about using Integrity checks instead of the overly complicated upsert.

So, my prototype for row-level locking is really trivial in comparison to "real kinto" in that, in kinto, we use SQLAlchemy instead of handcrafted SQL strings. For example, the prototype assumes that you always include, as parameters, the last_modified. Perhaps it needs to be smarter and break that up for with- and without a last_modified.

If you think the row-level prototype is sane in theory, I can start attempting a patch.

peterbe avatar Apr 24 '18 20:04 peterbe