m3 icon indicating copy to clipboard operation
m3 copied to clipboard

Concurrently write ConsumerWriters on error

Open shaan420 opened this issue 10 months ago • 0 comments

What this PR does / why we need it: When writing to a ConsumerService, m3msg producer randomly picks a ConsumerWriter that writes to a replica. It will block until there is a success or an error. In certain cases such as deployment of the ConsumerService, the error path induces an very high latency (25s+). This creates a huge backlog in the m3aggregator message queue and drastically increases the consume latency of the messages. In order to minimize the impact of these errors, this PR waits for a configurable amount of time on a write to return. If it doesn't then it opportunistically starts writing the message to another random replica concurrently. If this succeeds and the message is acked, the subsequent writes will detect that the stalled ConsumerWriter is still active and will skip over it to go straight to another ConsumerWriter. As soon as the connection stability returns, the m3msg producer will go back to writing to one replica.

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

shaan420 avatar Feb 04 '25 13:02 shaan420