mayastor
mayastor copied to clipboard
[FAQ]: Single replica failure mode
Are you reporting an issue with existing content?
This would be an omission on the FAQ
Are you proposing new content, or a change to the existing documentation layout or structure? I'd propose a new section be added giving an outline as to what one should expect if they were to set the replication count to 1 for a volume. On the storage class parameters page, it says: "If set to one then the volume does not tolerate any node failure. " but that doesn't really outline what will happen. If the node hosting the volume were to go offline, the disk would go offline (obviously) but would another node in the cluster create a new, empty volume that would then be "usable"? If the original node came back online, would the (now old) volume be wiped? Or maybe the entire time that the node hosting the volume is offline the volume is unavailable, does that trigger references to the volume to be removed from the cluster? If/when the node comes back online does the volume re-appear and things continue as though nothing had happened?
I'm sure it's not recommended to use a single replica volume for reasons of potential data loss, but there are certain use cases where data loss might be okay and I'm curious about the ramifications of a node failure.
Tested this out on my cluster:
Drained node, rebooted node
- Volume goes to
Unknown
state - No new volume on another node is created
- Pod using volume crashes due to IO error
- PV stays around
Once the node came back online, our pod remained in an error state, had to force termination, the volume didn't seem to want to re-mount, and at this point we aborted the test and cleaned things up.
Did you manage to collect any logs from the csi node plugin?
I have logs, don't think it was a mayastor issue, I think kubernetes was just freaking out.
@datacore-tilangovan can you please have the FAQ added in the documentation as per the current behavior?
@datacore-tilangovan has this been done or still pending?