influxdb Have snapshots continuously retry in the background

The official issue for what to do about #25676

Dec 19 '24 15:12 pauldix

@praveen-influx is this still relevant after all the snapshot work you recently did?

Apr 17 '25 20:04 pauldix

@pauldix - I think it's still valid, we don't stop incoming writes after failing to write X snapshots as you'd outlined below

Maybe the answer is that we simply have snapshotting do infinite retries and then on the ingest path once we get past X number of WAL files backlogged, we stop accepting writes. Then we have some CLI tooling that will be able to move those WAL files off to a backup location and then inspect and dump them later. That way the operator can get back up quickly if it just happened to be some intermediate bad thing.

The behavior I believe currently would be to keep writing to a point it would eventually run into OOM (I haven't tested this).

Apr 22 '25 13:04 praveen-influx

We'll close this as won't fix as it won't be relevant when the new storage engine arrives.

Jun 23 '25 18:06 pauldix