python-mapswipe-workers icon indicating copy to clipboard operation
python-mapswipe-workers copied to clipboard

Firebase transactions get aborted when a lot of mapping happens

Open Hagellach37 opened this issue 3 years ago • 2 comments

It might be better to not use a Firebase transaction when transferring results, but do it the "normal" way. So download the results and once they go inserted into the postgres db, update the refs in firebase.

Currently, when using transaction these get aborted when the transaction doesn't manage do transfer all data when someone is adding a new result. When a lot of users are mapping for a project, this means that transaction might never succeed. This then blocks the workers in general and also for other projects no results will be transfered. From Firebase SDK documentation:

"If another client writes to this location before the new value is successfully saved, the update function is called again with the new current value, and the write will be retried."

When not using transactions, we need to make sure that we only delete results that we have transfered in the first place.

When a transaction fails, this will trigger the download of results again. Thus this might also have a big impact on the size of the data we download from Firebase. If we download the same results over and over again, this is something we should avoid...

When results are not transferred this also means that the progress can't be calculated.

code mainly here: https://github.com/mapswipe/python-mapswipe-workers/blob/1911294053795e0b4e9d08aa5d45962d11cb45a9/mapswipe_workers/mapswipe_workers/firebase_to_postgres/transfer_results.py#L11

Hagellach37 avatar Apr 19 '21 08:04 Hagellach37

Hey @laurentS @Matthias-Schaub , it would be great to get an opinion from you on this. :)

Hagellach37 avatar Apr 19 '21 08:04 Hagellach37

Together with @laurentS we discussed some ideas:

The old version

  • just for reference: it takes 7 seconds to transfer results for 21 groups
  • around 3 seconds to get user_ids from postgres
  • around 4 seconds to insert results in postgres table

con:

  • transaction may get aborted when a lot of users map at the same time

Transactions on the group level

  • we could keep the logic to use transaction, but apply it one level below
  • we would use transaction not for all results of a project, but for the groups

con:

  • this is rather slow, when just using "normal" queries
  • transfering results for 21 groups took around 80 seconds
  • this is too slow in situations when many results get added

Do not use transaction

  • keep the logic on the project level
  • this is how we did it before mapswipe backend version 2, e.g. check here: https://github.com/mapswipe/python-mapswipe-workers/blob/e0116879df1c55d3184c04c7893e9b17add645ca/mapswipe_workers/basic/BaseFunctions.py#L746
  • this takes around 7 seconds and reaches a similar performance than the existing workflow

con:

  • using firebase transaction might be a bit safer (trying to avoid deleting data we don't want to delete)

Hagellach37 avatar Apr 19 '21 23:04 Hagellach37