python-mapswipe-workers
python-mapswipe-workers copied to clipboard
Firebase transactions get aborted when a lot of mapping happens
It might be better to not use a Firebase transaction when transferring results, but do it the "normal" way. So download the results and once they go inserted into the postgres db, update the refs in firebase.
Currently, when using transaction these get aborted when the transaction doesn't manage do transfer all data when someone is adding a new result. When a lot of users are mapping for a project, this means that transaction might never succeed. This then blocks the workers in general and also for other projects no results will be transfered. From Firebase SDK documentation:
"If another client writes to this location before the new value is successfully saved, the update function is called again with the new current value, and the write will be retried."
When not using transactions, we need to make sure that we only delete results that we have transfered in the first place.
When a transaction fails, this will trigger the download of results again. Thus this might also have a big impact on the size of the data we download from Firebase. If we download the same results over and over again, this is something we should avoid...
When results are not transferred this also means that the progress can't be calculated.
code mainly here: https://github.com/mapswipe/python-mapswipe-workers/blob/1911294053795e0b4e9d08aa5d45962d11cb45a9/mapswipe_workers/mapswipe_workers/firebase_to_postgres/transfer_results.py#L11
Hey @laurentS @Matthias-Schaub , it would be great to get an opinion from you on this. :)
Together with @laurentS we discussed some ideas:
The old version
- just for reference: it takes 7 seconds to transfer results for 21 groups
- around 3 seconds to get user_ids from postgres
- around 4 seconds to insert results in postgres table
con:
- transaction may get aborted when a lot of users map at the same time
Transactions on the group level
- we could keep the logic to use transaction, but apply it one level below
- we would use transaction not for all results of a project, but for the groups
con:
- this is rather slow, when just using "normal" queries
- transfering results for 21 groups took around 80 seconds
- this is too slow in situations when many results get added
Do not use transaction
- keep the logic on the project level
- this is how we did it before mapswipe backend version 2, e.g. check here: https://github.com/mapswipe/python-mapswipe-workers/blob/e0116879df1c55d3184c04c7893e9b17add645ca/mapswipe_workers/basic/BaseFunctions.py#L746
- this takes around 7 seconds and reaches a similar performance than the existing workflow
con:
- using firebase transaction might be a bit safer (trying to avoid deleting data we don't want to delete)