securedrop
securedrop copied to clipboard
Race conditions possible on the `Source.interaction_count`
Description
We rely heavily on the leading integer of a {Submission,Reply}.filename
(e.g., 1-foo_bar-msg.gpg
). This number is not atomically incremented in the backend via the source or journalist web interfaces not by the journalist API.
Source web UI:
https://github.com/freedomofpress/securedrop/blob/54c79107b794af72e751915248c2571b35967b7f/securedrop/source_app/main.py#L142-L158
Journalist web UI:
https://github.com/freedomofpress/securedrop/blob/54c79107b794af72e751915248c2571b35967b7f/securedrop/journalist_app/main.py#L103-L105
Journalist API:
https://github.com/freedomofpress/securedrop/blob/54c79107b794af72e751915248c2571b35967b7f/securedrop/journalist_app/api.py#L233-L242
Steps to Reproduce
Create a source, submit.
Then in source_app/main.py
add the following global variable
DO_SLEEP = False
and in the reply()
function, add:
global DO_SLEEP
if msg:
g.source.interaction_count += 1
if DO_SLEEP:
DO_SLEEP = False
import time
time.sleep(30)
fnames.append(
current_app.storage.save_message_submission(
g.filesystem_id,
g.source.interaction_count,
journalist_filename,
msg))
This will cause the first message to take a long time to submit, then the next to submit immediately.
Open two tabs as the source. Submit two messages in rapid succession.
Do into the docker container and look at the db
sqlite> select * from submissions;
1|b5db5ac5-071f-49d1-b387-df5cda75c5c9|1|1-anomalous_yield-msg.gpg|604|1
2|d5f643cc-bd4c-486f-95bc-23e4c8c4ca51|1|2-anomalous_yield-msg.gpg|604|1
3|7a2d77af-cd72-4367-a6d4-4fc7da4c515e|2|1-concealing_semaphore-msg.gpg|604|1
4|2eac4b7b-a87c-486b-886b-a19919666f8b|2|2-concealing_semaphore-msg.gpg|604|1
5|f70ab493-4b98-44a2-94f8-75462349adfd|3|1-terminated_diarist-msg.gpg|602|0
6|20d5100e-8714-4654-b729-6fd8437d0f48|3|2-terminated_diarist-msg.gpg|600|0
7|33cd5216-1189-4beb-904e-bbdbfd5a6857|3|3-terminated_diarist-msg.gpg|593|0
8|76d735cf-ce5c-4950-9dd0-0a9dd94f21aa|3|3-terminated_diarist-msg.gpg|594|0
Note the last two rows. There are two messages with the same int
prefix.
Expected Behavior
It should not be possible to create messages with identical interaction counts.
Actual Behavior
It is.
Comments
A journalist or source receiving messages out of order should not happen:
- In general
- Especially not in a high security environment where clear communication has security implications
Possible Fixes
All of these calls should be wrapped in a SQLite exclusive transaction. This is not the same as the implicit transaction that is begun when the first db.session.{add,delete}(...)
or whatnot is called. The transaction needs to be explicitly started before the read of the Source
object and then committed.
Additionally, we could manually write some SQL that sets these all at once so that the interaction count and filenames are set within the SQL engine to avoid having to lock the entire table for a long time.
@heartsucker nice catch on this one.
Also what are the chances of hitting this by using the client? Otherwise, it might be a good to solve issue, but, moving to xenial will take a lot of time from all of us.
Right now the client just fires off all replies as they're sent, so it's higher on the client than on the web UI. However, since a reply is a couple KB at most, they're going to be received and written to disk very fast. This means the window of this happening is very narrow. I don't think this is very high priority right now, but def should be tackled post Xenial. Maybe once I get replies finished on the client, I can try to resolve this because I generally have a goal of "strict conversation ordering" in the back of my head, and there's a few things that impact this. This ticket being possibly the only server side one.
Here to revive this issue and say that I ran into this when testing some reply-related updates on the client. When the client creates a new Reply object locally (because a DraftReply was just successfully sent and it's being promoted to a Reply), it uses the same logic. However, since the interaction_count
is updated by the server (so it's updated on a successful sync), if the client fires off a bunch of replies before syncing with the server and experiences any network or sync issues, there could definitely be conflicting local vs remote messages as above.
Another small note to our future selves: There is some sneaky behaviour with addition-assignment and we might want to take another look at how we're incrementing those counters; see eg https://stackoverflow.com/questions/2334824/how-to-increase-a-counter-in-sqlalchemy/2334917#comment30697297_2334917