MSC4140 delayed event rate limit failures are not handled
As per https://github.com/matrix-org/matrix-spec-proposals/blob/toger5/expiring-events-keep-alive/proposals/4140-delayed-events-futures.md#rate-limiting-at-the-point-of-sending we are applying rate limiting at the point of the delayed event being entered into the DAG.
However, we don't handle this temporary failure and retry.
This means that the delayed event then gets dropped.
This was observed in the wild yesterday.
MSC4140
https://github.com/element-hq/synapse/pull/18018 contains a proposed fix.
However, the approach in that PR is to simply disable that rate limit.
If that PR is accepted then the should be some follow-up on this issue here to get the original security consideration mitigated. For example, one of:
- taking account of delayed event scheduling when accepting the delayed event in the first place
- adding the rate limit back in but handling the rate limit failure sensibly
@hughns @AndrewFerr I agree that adding handling for exceptions raised by the ratelimiting code and then attempting to send the event with the delay suggested by the rate-limiter makes sense.
My only worry is that you could end up with a very large "backlog" of events, which is completely invisible to the client. I'm not sure if this would hurt the UX of any use cases you're aware of?
Is https://github.com/element-hq/synapse/pull/18019 fixing this issue?
And more specifically is it also tackling
... follow-up on this issue here to get the original security consideration mitigated. For example, one of:
- taking account of delayed event scheduling when accepting the delayed event in the first place
- adding the rate limit back in but handling the rate limit failure sensibly
Is #18019 fixing this issue?
No, that one is just putting a rate limit on restarting/canceling/listing delayed events.