synapse icon indicating copy to clipboard operation
synapse copied to clipboard

MSC4140 delayed event rate limit failures are not handled

Open hughns opened this issue 1 year ago • 4 comments

As per https://github.com/matrix-org/matrix-spec-proposals/blob/toger5/expiring-events-keep-alive/proposals/4140-delayed-events-futures.md#rate-limiting-at-the-point-of-sending we are applying rate limiting at the point of the delayed event being entered into the DAG.

However, we don't handle this temporary failure and retry.

This means that the delayed event then gets dropped.

This was observed in the wild yesterday.


MSC4140

hughns avatar Dec 10 '24 09:12 hughns

https://github.com/element-hq/synapse/pull/18018 contains a proposed fix.

However, the approach in that PR is to simply disable that rate limit.

If that PR is accepted then the should be some follow-up on this issue here to get the original security consideration mitigated. For example, one of:

  • taking account of delayed event scheduling when accepting the delayed event in the first place
  • adding the rate limit back in but handling the rate limit failure sensibly

hughns avatar Dec 10 '24 09:12 hughns

@hughns @AndrewFerr I agree that adding handling for exceptions raised by the ratelimiting code and then attempting to send the event with the delay suggested by the rate-limiter makes sense.

My only worry is that you could end up with a very large "backlog" of events, which is completely invisible to the client. I'm not sure if this would hurt the UX of any use cases you're aware of?

anoadragon453 avatar Dec 19 '24 16:12 anoadragon453

Is https://github.com/element-hq/synapse/pull/18019 fixing this issue?

And more specifically is it also tackling

... follow-up on this issue here to get the original security consideration mitigated. For example, one of:

  • taking account of delayed event scheduling when accepting the delayed event in the first place
  • adding the rate limit back in but handling the rate limit failure sensibly

fkwp avatar Feb 21 '25 12:02 fkwp

Is #18019 fixing this issue?

No, that one is just putting a rate limit on restarting/canceling/listing delayed events.

AndrewFerr avatar Feb 21 '25 13:02 AndrewFerr