kinto
kinto copied to clipboard
TOCTOU issues at resource level
Per https://github.com/Kinto/kinto/pull/1406#issuecomment-347924141, there are TOCTOU issues in the resource level methods that wrap the storage backend. Most of these methods have code that looks something like:
def delete(self):
# ...
record = self._get_record_or_404(self.record_id)
self._raise_412_if_modified(record)
# ...
deleted = self.model.delete_record(record, last_modified=last_modified)
The problem here is that another transaction could, in the time between the get
and the delete
, update the record in ways that would cause a race condition. It could update the last_modified of the time, thereby causing the 412 check to be invalidated; it could delete the record, causing the _or_404 check to be invalidated.
The affected methods are collection_post
, put
, delete
, and patch
.
I don't see any obvious solution to this problem, but I see a bunch of relatively complicated ones:
- Cause the storage backend to lock the record when we do the
get
. This would require breaking compatibility -- we'd have to add a new "lock" argument to the storage backends -- and it would only let us solve for the case of "match any" or "match this version", but not "match none". - Cause the storage backend to return the "old" version of the record as well as the new one. This would allow us to keep the 412 and 404 checks in the resource class, and rely on transaction rollbacks to take care of violations. Unfortunately, this would also be a breaking change, and would disqualify any backend that isn't already transactional.
- Push the 412 and 404 constraints into the storage layer. This would still be a breaking change but any storage backend that can do an atomic compare-and-set would still be OK. (Any storage backend that supports running in a single-threaded mode would qualify.) Pushing 404 is easy -- most storage backends already support the
RecordNotFoundError
-- but pushing 412s requires some work and some fancy SQL footwork.
It's not an issue to have backend breaking changes, we already had a number of them. Also we already have features that needs transactions and work in a degraded mode for memory and redis backends (that's why we moved redis out of the scope of kinto because we didn't had time to add transaction support for kinto)
@ronhanson is having a usecase that might need this exact feature. findAndUpdate, maybe he can have insight about which solution he would take.
pushing 412s requires some work and some fancy SQL footwork.
Can we add a WHERE last_modified = 'previous_last_modified'
clause for the update or delete part and raise a 412 if no records where updated/deleted?
A PUT
can be a create or an update. There's no such thing as INSERT ... WHERE last_modified = 'previous_last_modified'
.
I thought we were trying to do the insert and in case of conflict handling the update?
I think that might be the right approach. Be sure to handle the case of If-Match
, which has to fail if there is no record with which to conflict.
Pardon my kinto-juniorness but...
Feels like this issue is crossing paths with https://github.com/Kinto/kinto/pull/1598 which is about doing inserts.
It feels like the bestest way would be to use row-level locking. And for inserts, rely on idx_records_parent_id_collection_id_last_modified
.
The TOCTOU problem happens because we "disconnect" between the client and the database whilst we figure out whether to proceed with the delete/update or whether to raise an error which results in a 419(?) HTTP error.
By doing something like:
SELECT last_modified FROM records
WHERE id=:object_id and parent_id=:parent_id and collection_id=:collection_id
FOR UPDATE NOWAIT
...first, we can be totally confident how to proceed. If the record existed the row-level locking will protect us from a single process getting access to the UPDATE/DELETE to come.
If it didn't exist and it's an insert/upsert, you can't lock the whole table or anything like that but you are also blocked from doing two inserts with the same id|parent_id|collection_id
combo. E.g.
INSERT INTO records (id, parent_id, collection_id, data, last_modified) VALUES
(:object_id, :parent_id, :collection_id, (:data)::JSONB, :last_modified)
and, on the Python level you watch out for an IntegrityError
. E.g.
try:
cursor.execute('INSERT INTO ...')
except IntegrityError as exception:
if 'records_pkey' in exception.args[0]:
raise TooLateToInsertError('419 for you!')
raise
Perhaps adding proper Postgresql locking is already something we're planning or perhaps there are circumstances I don't (yet) really understand.
On the Storage interface, it appears to be create
, update
and delete
only. Right? So we can write one implementation of the upsert-with-locking for each. I'd be happy to volunteer where I can.
By the way, the delete
case is solvable without a lock using @Natim 's idea of either using WHERE last_modified = 'previous_last_modified'
or something like this:
cur.execute("""
DELETE FROM records WHERE
id=:object_id and parent_id=:object_id and collection_id=:collection_id
and last_modified=from_epoch(:last_modified)
RETURNING id
""", {...})
fetched = cur.fetchone()
if not fetched:
raise TooLateToDeleteError('419 for you!')
I have prototypes for insert, upsert, update and delete all as python scripts that are pure psycopg2 based. I use a threadpool and some time.sleep()
to trigger the race conditions and in all cases I can "prove" the prototypes. Is this interesting?
In all my prototypes I have not yet worried myself with the complexity of tombstones but I'm sure we can factor it in.
The only thing that is a bit stinky is that the interface between Python and Postgres is that you have to rely on the driver to wrap Postgres errors into exception which you then capture and analyze. E.g. this.
I would be curious to see your scripts.
I'm trying to collect my thoughts. https://github.com/Kinto/kinto/issues/1525 is all about the create()
method. This issue has escalated into a discussion about leveraging proper row-level locks.
Hopefully we can break this up. This issue can be about using row-level locks to do the logical delete. And that other issue can be about using Integrity checks instead of the overly complicated upsert.
So, my prototype for row-level locking is really trivial in comparison to "real kinto" in that, in kinto, we use SQLAlchemy instead of handcrafted SQL strings. For example, the prototype assumes that you always include, as parameters, the last_modified
. Perhaps it needs to be smarter and break that up for with- and without a last_modified
.
If you think the row-level prototype is sane in theory, I can start attempting a patch.