prometheus Reduce the impact of remote write resharding

Reduce the impact of remote write resharding

Open csmarchbanks opened this issue 4 years ago • 23 comments

Right now resharding, especially sharding up is very disruptive to throughput. The resharding process drains all queues, which takes a significant amount of time if the remote endpoint is having issues. This will block new samples from being appended while they queues clear, and one slow shard can cause throughput to drop significantly.

Instead of waiting for all shards to flush to remote storage we could send them into the new shards that are being created, being sure to rebalance them into the appropriate shard.

May 08 '20 17:05 csmarchbanks

Resolved without zephyr changes, work around was to use the SO_BINDTODEVICE socket option to lock a socket connection to a specified network interface

Feb 20 '24 13:02 matt-wood-ct

prometheus prometheus copied to clipboard

Reduce the impact of remote write resharding

prometheus
prometheus copied to clipboard