spring-modulith icon indicating copy to clipboard operation
spring-modulith copied to clipboard

Minimising contention on event_publish table and pessimistic lock exceptions

Open matiwinnetou opened this issue 1 year ago • 2 comments
trafficstars

In our system we have a lot of small events flying (one of the reason the are small rather than batching is because indices cannot be created on large events - as you may remember modulith creates index on body of the event itself which causes issues on e.g. on postgress)

Event_Publication table is therefore very often updated which leads to pessimistic locks exceptions spamming output. Issues arise especially on completing a given event. I have questions here:

  • what is most common way to deal with those problems?
  • is it caused by longer than expected running transactions on our side?
  • are these events that caused pessimistic lock issues on retried by modulith or spring framework?
  • is there any way to supress output with all those warning if events are retried once locks free up?
  • have you observed such behaviour in various systems you working on or others?
  • anything else that maybe useful to help to deal with those contention issues, e.g. tips?

matiwinnetou avatar Mar 28 '24 14:03 matiwinnetou

We are experiencing similar problems. In our case, we create multiple events (1 - 5) in a loop for the same event handler. We do this to ensure that the events are handled independently. We expect failures in the event handling process and implemented retries and failure notifications. However, problems with the event publication registry (locking, deadlocks) seem to prevent the execution of our application code. This results in looping spinners in our frontend because neither success handlers nor failure handlers are executed. Retries of Modulith don't seem to be supported at the moment.

danstooamerican avatar Mar 29 '24 08:03 danstooamerican

So those events that are contented and causing locking issues will not be retried? I am really surprised because I think I saw them being replied somehow. I am testing on a system where I know how many events I should become and eventually they all came I think but I am not 100% sure.

We are also looping btw, but in our case we partition large data into "batches", this helps because events are then not tiny but they are also not very large.

As for Modulith not supporting retries, this is because it could cause issues with multi server setup I think. @odrotbohm recommends first to use something like shed_lock or job_runnr or similar which will prevent running multiple jobs across instances at the same time.

matiwinnetou avatar Mar 29 '24 14:03 matiwinnetou