securedrop icon indicating copy to clipboard operation
securedrop copied to clipboard

Race conditions possible on the `Source.interaction_count`

Open heartsucker opened this issue 6 years ago • 4 comments

Description

We rely heavily on the leading integer of a {Submission,Reply}.filename (e.g., 1-foo_bar-msg.gpg). This number is not atomically incremented in the backend via the source or journalist web interfaces not by the journalist API.

Source web UI:

https://github.com/freedomofpress/securedrop/blob/54c79107b794af72e751915248c2571b35967b7f/securedrop/source_app/main.py#L142-L158

Journalist web UI:

https://github.com/freedomofpress/securedrop/blob/54c79107b794af72e751915248c2571b35967b7f/securedrop/journalist_app/main.py#L103-L105

Journalist API:

https://github.com/freedomofpress/securedrop/blob/54c79107b794af72e751915248c2571b35967b7f/securedrop/journalist_app/api.py#L233-L242

Steps to Reproduce

Create a source, submit.

Then in source_app/main.py add the following global variable

DO_SLEEP = False

and in the reply() function, add:

        global DO_SLEEP
        if msg:
            g.source.interaction_count += 1
            if DO_SLEEP:
                DO_SLEEP = False
                import time
                time.sleep(30)
            fnames.append(
                current_app.storage.save_message_submission(
                    g.filesystem_id,
                    g.source.interaction_count,
                    journalist_filename,
                    msg))

This will cause the first message to take a long time to submit, then the next to submit immediately.

Open two tabs as the source. Submit two messages in rapid succession.

Do into the docker container and look at the db

sqlite> select * from submissions;
1|b5db5ac5-071f-49d1-b387-df5cda75c5c9|1|1-anomalous_yield-msg.gpg|604|1
2|d5f643cc-bd4c-486f-95bc-23e4c8c4ca51|1|2-anomalous_yield-msg.gpg|604|1
3|7a2d77af-cd72-4367-a6d4-4fc7da4c515e|2|1-concealing_semaphore-msg.gpg|604|1
4|2eac4b7b-a87c-486b-886b-a19919666f8b|2|2-concealing_semaphore-msg.gpg|604|1
5|f70ab493-4b98-44a2-94f8-75462349adfd|3|1-terminated_diarist-msg.gpg|602|0
6|20d5100e-8714-4654-b729-6fd8437d0f48|3|2-terminated_diarist-msg.gpg|600|0
7|33cd5216-1189-4beb-904e-bbdbfd5a6857|3|3-terminated_diarist-msg.gpg|593|0
8|76d735cf-ce5c-4950-9dd0-0a9dd94f21aa|3|3-terminated_diarist-msg.gpg|594|0

Note the last two rows. There are two messages with the same int prefix.

Expected Behavior

It should not be possible to create messages with identical interaction counts.

Actual Behavior

It is.

Comments

A journalist or source receiving messages out of order should not happen:

  1. In general
  2. Especially not in a high security environment where clear communication has security implications

Possible Fixes

All of these calls should be wrapped in a SQLite exclusive transaction. This is not the same as the implicit transaction that is begun when the first db.session.{add,delete}(...) or whatnot is called. The transaction needs to be explicitly started before the read of the Source object and then committed.

Additionally, we could manually write some SQL that sets these all at once so that the interaction count and filenames are set within the SQL engine to avoid having to lock the entire table for a long time.

heartsucker avatar Jan 18 '19 14:01 heartsucker

@heartsucker nice catch on this one.

kushaldas avatar Jan 18 '19 14:01 kushaldas

Also what are the chances of hitting this by using the client? Otherwise, it might be a good to solve issue, but, moving to xenial will take a lot of time from all of us.

kushaldas avatar Jan 18 '19 14:01 kushaldas

Right now the client just fires off all replies as they're sent, so it's higher on the client than on the web UI. However, since a reply is a couple KB at most, they're going to be received and written to disk very fast. This means the window of this happening is very narrow. I don't think this is very high priority right now, but def should be tackled post Xenial. Maybe once I get replies finished on the client, I can try to resolve this because I generally have a goal of "strict conversation ordering" in the back of my head, and there's a few things that impact this. This ticket being possibly the only server side one.

heartsucker avatar Jan 18 '19 15:01 heartsucker

Here to revive this issue and say that I ran into this when testing some reply-related updates on the client. When the client creates a new Reply object locally (because a DraftReply was just successfully sent and it's being promoted to a Reply), it uses the same logic. However, since the interaction_count is updated by the server (so it's updated on a successful sync), if the client fires off a bunch of replies before syncing with the server and experiences any network or sync issues, there could definitely be conflicting local vs remote messages as above.

Another small note to our future selves: There is some sneaky behaviour with addition-assignment and we might want to take another look at how we're incrementing those counters; see eg https://stackoverflow.com/questions/2334824/how-to-increase-a-counter-in-sqlalchemy/2334917#comment30697297_2334917

rocodes avatar Mar 21 '22 23:03 rocodes