Update counts using single task and PostgreSQL notification channel
Requires https://github.com/weiznich/diesel_async/pull/251
Before you get too far with this, does this mean moving away from table triggers, and having the count-related actions be backgrounded?
I'm wondering how safe/reliable this would be when you have multiple updates, and the triggers aren't holding any locks anymore.
I might either:
- Completely move counts away from triggers.
- Only use the background task to update the
is_countedcolumn, and make the existing triggers use that column.
Either way, it's easy to guarantee that no increment or decrement is duplicated or permanently skipped.
Counter updates in response to inserts and updates updates will involve doing one of the following atomically:
- Change
is_countedfromtruetofalse, then increase counters. - Change
is_countedfromfalsetotrue, then decrease counters.
Deletion of rows needs to be handled very differently, but will still be reliable. What I will probably do is make purges use update instead of delete, and delete the rows in the background task.
How is this going to work if an instance uses multiple Lemmy processes?
Just to give my input after doing a ton of this bulk insert / update work...
Triggers can and do work well, I think we just need to optimize them to make sure they're limited to specific column updates, and don't necessarily fire on every insert or delete. Also to do as few joins as possible, and only increment or decrement by 1.
I'm not sure that any of the is_counted additions are necessary, especially since things might have different counts on the same source data. IE a total_report_count vs an unresolved_report_count can't have is_counted on the report row, because its counted differently it two different contexts.
#5854
How is this going to work if an instance uses multiple Lemmy processes?
If an instance is correctly configured, then exactly one process updates counts, because the task is started by scheduled_tasks::setup. If multiple processes update counts at the same time, then updates might get split up, which is less efficient but has the same result because UPDATE queries hold exclusive locks and re-check the WHERE condition as needed.
The notification channel is not affected by whether or not the sending side and the receiving side are associated with the same Lemmy process.
IE a
total_report_countvs anunresolved_report_countcan't haveis_countedon the report row, because its counted differently it two different contexts.
All three will be updated in the same transaction.
Actually, the report count case might require an additional is_counted_in_unresolved column.