cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

kv: cluster unavailability in proximity to node drain

Open ajstorm opened this issue 1 year ago • 0 comments

On the DRT cluster, we're seeing a brief period of complete unavailability around the time that we're draining a node. The recently introduced chaos script periodically drains/kills/disk-stalls nodes and then recovers the outage. In a recent run, we're seeing that half of all ranges become unavailable:

image

At the same time, the foreground workload drops to zero momentarily:

image

This is being discussed further here.

ajstorm avatar Feb 22 '24 16:02 ajstorm