editgroups
editgroups copied to clipboard
Support undoing archived batches
https://editgroups.toolforge.org/b/CB/8f5438bc04fc/ I would like to undo that one. How do I do that?
This is indeed not supported yet: it would first require unarchiving it. This means re-fetching all the edits in the batch, for instance by fetching user contributions. Then, the batch could be undone again.
I would like to help getting this done. Can someone point me in the right direction?
Of course!
When archiving batches, we remove in our database all of their edits but the last 10 ones. It's a measure to prevent EditGroup's own database from growing too much as time goes by. The archiving of batches is done periodically with this method: https://github.com/Wikidata/editgroups/blob/b674c847e2149c9c55f9bfa2348692beb8ed398c/store/models.py#L267-L277
So, if we want to undo a batch that has been archived, we need to re-fetch the edits that are deleted by this method.
Because this is generally going to take a while, we'll want to make sure that the batch will not be archived again by the periodic task while we are doing it, and also for some time after the batch has been unarchived. This could be done by adding a date field on the Batch model indicating the date of latest unarchival (and making sure we only archive batches whose date of latest unarchival is null or older than some threshold). That could be the first step: add this new field to the Batch class, generate the corresponding migration with ./manage.py makemigrations and update the archival task to take it into account.
Then, for the unarchiving itself, as mentioned above we could look into querying the MediaWiki API to fetch the contributions of the user. In our batch metadata, we know when the batch started and ended, so we just need to fetch the contributions between those times. The tasks that come to mind here would be:
- parse the API response to represent the edits in the same format as what we are retrieving from the EventStream API (so that we can use as much common ingestion logic as possible, between the EventStream and the MediaWiki API use cases).
- filter out the edits which do not belong to the batch we are trying to unarchive (so that we are only adding those edits)
- connect up all this code into a Celery task (so that this can be run asynchronously, independently from the web request that triggered the unarchival).
This is the core backend work for this issue and it'd be worth writing test cases for it (this code base is pretty extensively tested so it should not be too hard to imitate what is already there).
Finally, we'd need to expose the unarchiving feature in the frontend. This means adding a route to let the user trigger the unarchiving, and expose the corresponding button in the frontend, by modifying the template that displays batches: ./store/templates/store/batch.html.
I hope this description is not too daunting and I'd be happy to give more details where needed!
I just thought of another approach: instead of adding support for un-archiving batches, one could also consider a less demanding approach: "simply" fetch the edits from the MediaWiki API when undoing them, in a streaming fashion. This means that we don't even need to save them back to the database.
It probably also gives a better UX: the undo button would remain available on all batches, whether they are archived or not, and users would not need to first unarchive the batch before undoing it. Arguably, unarchiving a batch would be useful for other purposes (for instance to download the CSV of all its edits), but it's probably minor compared to undoing.
With this approach, the core of the work would be to write a Python generator which would iterate through the edits of a particular user between two timestamps, filtering out the edits which do not belong to a particular batch. This would then be used in the undoing task in place of the iteration on the edits ingested in the database. We'd need to make some small tweaks to update the number of undone edits in the Batch metadata (since currently, this is updated by the general edit ingestion logic, not the undoing code) but that does not seem to difficult.