lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

Update counts using single task and PostgreSQL notification channel

Open dullbananas opened this issue 6 months ago • 6 comments

Requires https://github.com/weiznich/diesel_async/pull/251

dullbananas avatar Jul 17 '25 06:07 dullbananas

Before you get too far with this, does this mean moving away from table triggers, and having the count-related actions be backgrounded?

I'm wondering how safe/reliable this would be when you have multiple updates, and the triggers aren't holding any locks anymore.

dessalines avatar Jul 18 '25 14:07 dessalines

I might either:

  • Completely move counts away from triggers.
  • Only use the background task to update the is_counted column, and make the existing triggers use that column.

Either way, it's easy to guarantee that no increment or decrement is duplicated or permanently skipped.

Counter updates in response to inserts and updates updates will involve doing one of the following atomically:

  • Change is_counted from true to false, then increase counters.
  • Change is_counted from false to true, then decrease counters.

Deletion of rows needs to be handled very differently, but will still be reliable. What I will probably do is make purges use update instead of delete, and delete the rows in the background task.

dullbananas avatar Jul 18 '25 21:07 dullbananas

How is this going to work if an instance uses multiple Lemmy processes?

Nutomic avatar Jul 21 '25 08:07 Nutomic

Just to give my input after doing a ton of this bulk insert / update work...

Triggers can and do work well, I think we just need to optimize them to make sure they're limited to specific column updates, and don't necessarily fire on every insert or delete. Also to do as few joins as possible, and only increment or decrement by 1.

I'm not sure that any of the is_counted additions are necessary, especially since things might have different counts on the same source data. IE a total_report_count vs an unresolved_report_count can't have is_counted on the report row, because its counted differently it two different contexts.

#5854

dessalines avatar Jul 21 '25 15:07 dessalines

How is this going to work if an instance uses multiple Lemmy processes?

If an instance is correctly configured, then exactly one process updates counts, because the task is started by scheduled_tasks::setup. If multiple processes update counts at the same time, then updates might get split up, which is less efficient but has the same result because UPDATE queries hold exclusive locks and re-check the WHERE condition as needed.

The notification channel is not affected by whether or not the sending side and the receiving side are associated with the same Lemmy process.

IE a total_report_count vs an unresolved_report_count can't have is_counted on the report row, because its counted differently it two different contexts.

All three will be updated in the same transaction.

dullbananas avatar Jul 22 '25 02:07 dullbananas

Actually, the report count case might require an additional is_counted_in_unresolved column.

dullbananas avatar Jul 22 '25 02:07 dullbananas