Allow volume create if the bricks can be created in quorum number of nodes
In GD1's world, if one out of three nodes cluster go down all volume creation requests fail. In containers world, this becomes a bottleneck especially with intelligent volume provisioning. GD2 should be able to provide a capability where in such situation volume creation goes through. Once the glusterd service comes back on the pod, there should be a auto way to replay all the pending bricks provisioning on that pod.
Draft design:
Instead of depending on Transaction to replay the steps when that node comes up, we can maintain a flag in Brick structure to differentiate provisioned vs unprovisioned brick.
Brick {
RootDevice string
VgName string
LvName string
Path string
Host string
Provisioned bool
}
During initial plan, in the first iteration plan using the list of online peers. If quorum bricks are available then look for devices from the list of offline peers. If the bricks are selected from offline peers mark the provisioned flag as false. Transaction step will not try to create the brick if brick.Provisioned == false.
Brick Start needs to be enhanced to check the above flag to decide that brick needs to be provisioned or not.
During Brick Start(During Volume start or Glusterd2 start or Volume set):
if !brick.Provisioned {
provisionBrick(brick)
brick.Provisioned = true
updateVolumeInfo() // After the brick.Provisioned change
}
generateBrickVolfile()
startBrick()
Changes required:
- During Volume create device Available size is updated by getting the actual device available size by running respective lvm command. This needs to be changed to update current available size by subtracting claimed size.
- During Volume Delete, instead of running lvm command to get the available size, just add the reclaimed size to the current Available size.
- Changes to Start Brick function to provision the brick if not provisioned.
- Changes to Brick structure
- Replace brick needs to be enhanced to understand this state, if brick replace is requested on the brick which is yet to be provisioned. (Like skipping the operations to be done on source brick device or peer)
Notes:
- If the provisioned brick/peer don't come online then the Volume will remain in Replica 2, even though the Volume info shows replica 3.
- All the other functionalities are not affected since Volume info will contain the required information about the brick which will be provisioned later.
@atinmu @oshankkumar @kshlm @amarts @Madhu-1 Let me know your thoughts.