elasticsearch
elasticsearch copied to clipboard
Creating a snapshot does not verify that all nodes are writing to the same blobstore
Snapshots work by writing to a blobstore in which the same blob can be accessed at the same path across all nodes. By default we check that the blobstore is shared across nodes correctly when the repository is registered. This check helps catch config and permission errors, including cases where the underlying blobstore is not properly shared across all nodes. This check can be bypassed by users who need to register a blobstore which is unavailable at registration time but will become available later on.
Today if the blobstore is accessible but not shared (and the user bypasses the registration-time checks that would prevent this) then snapshot creation will report success because we create snapshots without ever reading a blob that another node has written. Listing, restoring, and deleting snapshots may also sometimes appear to succeed. However it's definitely not safe to rely on such a setup to protect your data.
We should not report success when creating a snapshot in such a setup. We can detect this sort of problem by having the master read at least one blob written by every data node during snapshot creation. We mustn't verify too many blobs (e.g. one per shard) since this would be slow and expensive without adding much extra protection.
I propose that the master reads the first BlobStoreIndexShardSnapshot
that each data node writes, and fails the snapshot if that read fails. I think we don't need to re-check this on every snapshot creation, it should be enough to remember past successes of nodes that have remained in the cluster since. Possibly we should re-check every 24h or so just in case the repository gets unmounted out from under us.
Pinging @elastic/es-distributed (Team:Distributed)