lvm-localpv
lvm-localpv copied to clipboard
Support for LV types (RAID0/1/5/10)
Describe the problem/challenge you have
I'd like to be able to use lvm-localpv
with lvmraid
(LVM native RAID) to take advantage of RAID on a VolumeGroup with multiple drives.
Unfortunately LVM does not allow setting default Logical Volume type
s on VolumeGroups (this is probably for the best, complexity wise), so when lvm-localpv
the only way to enforce a non-default type
currently is via thinProvision
.
Describe the solution you'd like
Support for mirroring and other RAID configurations in StorageClass
parameters for lvm-localpv
.
Along with support for RAID configurations I'd like support for the --raidIntegrity
option to allow for some checksums of data on disk.
Anything else you would like to add: I'd like to work on this, and have a few questions/comments:
- Should I write a design doc? (I'd probably mirror the one for thin provisioning)
- Initially I'm only planning on supporting
raid0
/stripe
,raid
/raid1
/mirror
,raid5
,raid6
, andraid10
-
VolumeInfo
(which maybe should be namedVolumeParams
) seems to put it's own spin on the settings, so I have a few style-related questions:- Which is better --
Mirror
(mirror
in YAML) orLvType
(lvType
in YAML) ?Mirror
is a bit closer toThinProvision
in spirit,LvType
is a bit more explicit about the passing down of settings - Some of the modes require additional configuration (ex.
--stripes
,--mirrors
,--stripesize
), would that be best as something likeMirrorCount
(mirrorCount
in YAML)?
- Which is better --
Currently I'm using a quite well-known workaround -- running mdraid
(via mdadm
, with dm-integrity
configured) on the two disks below LVM and giving lvm-localpv
a volume group based on /dev/mdX
. I'd like to test LVM-native RAID against mdraid
itself as well ultimately so some support would be great.
Environment:
- LVM Driver version
- Kubernetes version (use
kubectl version
): - Kubernetes installer & version:
- Cloud provider or hardware configuration:
- OS (e.g. from
/etc/os-release
):
Another feature I use with LVM2 RAID is --nosync. When using SSDs that guarantee returning zeros for unwritten LBAs and support unmap/trim and lvm.conf has issue_discards=1, then the RAID doesn't need to be synchronized. This saves both time and SSD wear.
I'm going to try to take a stab at this. My plan is to make use of the StorageClass parameters block to allow for setting a raid enable option. Since PersistentVolumeClaim
s do not allow for setting CSI-specific parameters, my idea is to use annotations to specify the various raid options. A rough snapshot of how I think that it will look like is below:
StorageClass
definition:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-lvm
allowVolumeExpansion: true
provisioner: local.csi.openebs.io
parameters:
storage: "lvm"
vgpattern: "lvmvg.*"
allowRaid: true
PersistentVolumeClaim
definition:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: csi-lvmpv
annotations:
"local.csi.openebs.io/raid": |
{
"type": "raid10",
"integrity": true,
"mirrors": {
"count": 2,
"nosync": true,
},
"stripes": {
"count": 2,
"size": "64Ki",
}
}
spec:
accessModes:
- ReadWriteOnce
storageClassName: openebs-lvm
resources:
requests:
storage: 4Gi
The valid options for the annotation are shown below and should be valid JSON:
Option | Required | Valid Values | Description |
---|---|---|---|
type |
true |
raid0 / stripe , raid / raid1 / mirror , raid5 , raid6 , raid10 |
The RAID type of the volume. |
integrity |
false |
true , false |
Whether or not to enable DM integrity for the volume. Defaults to false . |
mirrors |
depends | Object | Configuration of mirrors. Certain RAID configurations require this to be set. |
mirrors.count |
true |
[0, ∞) | How many mirrors to enable. |
mirrors.nosync |
false |
true , false |
Whether or not to disable the initial sync. Defaults to false. |
stripes |
depends | Object | Configuration of stripes. Certain RAID configurations require this to be set. |
stripes.count |
true |
[0, ∞) | How many stripes to enable. |
stripes.size |
false |
[0, ∞) (but must be a power of 2) | The size of each stripe. If not specified, LVM will choose a sane default. |
I'll wait a few days before starting, to allow time to discuss changes, but as I really need this on my own cluster I'd like to get started as soon as possible.
I think this approach will work but might blur the roles of folks deploying services. I think of the StorageClasses being owned by the team managing infrastructure and PVC definition owned by the developers or platform teams consuming the provisioned storage. This approach might let a developer creating the yaml for a service or deployment that mucks up the RAID settings because of ignorance to the underlining infrastructure. Having the developers pick from a list of acceptable StorageClasses seems safer. This implies the PVC RAID annotations should be optional Storage Class parameters.
I think this approach will work but might blur the roles of folks deploying services. I think of the StorageClasses being owned by the team managing infrastructure and PVC definition owned by the developers or platform teams consuming the provisioned storage. This approach might let a developer creating the yaml for a service or deployment that mucks up the RAID settings because of ignorance to the underlining infrastructure. Having the developers pick from a list of acceptable StorageClasses seems safer. This implies the PVC RAID annotations should be optional Storage Class parameters.
That's an excellent point. I agree with you, but I wasn't sure if it was OK to tie individual logical volume settings to the entire StorageClass
. The method you propose would look like the following. For anyone coming into this later, feel free to vote on the preferred approach by reacting to method you like best with a heart.
With this new approach, the PersistentVolumeClaim
would no longer need any extra info, and the owning StorageClass
would look as follows (the chart from my previous comment would still apply here):
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-lvm
allowVolumeExpansion: true
provisioner: local.csi.openebs.io
parameters:
storage: "lvm"
vgpattern: "lvmvg.*"
raid:
type: raid10
integrity: true
mirrors:
count: 2
nosync: true
stripes:
count: 2
size: 64Ki
Does that seem more in line with what you were thinking?
Yes this more like what I was thinking. I'm on the fence with the structure. I like the specificity of the RAID object to match LVM2's approach but maybe it is tied to tight. Might other RAID solutions want to leverage the SC definition with a change to the "storage:" attribute? Also there are other more advanced options like --readahead and --vdo that might be desired. Continually extending and updating the SC definition might become laborious. I would propose keeping the list of defined parameters to those most common to any RAID system and provide a user-beware option to append additional arguments to the lvcreate. Here is an approach and I added some comments for the SC spec:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-lvm
allowVolumeExpansion: true
provisioner: local.csi.openebs.io
parameters:
storage: "lvm"
vgpattern: "lvmvg.*"
# The raidtype parameter is used as the lvcreate --type options.
# Currently the CSI plug-in supports linear, raid1, raid5, raid6 and raid10. Default is linear.
raidtype: raid10
# A true value enables data integrity checksums for raid images if available.
integrity: true
# The number of replicas stored on different physical volumes for the PVC.
mirrors: 2
# Skip initial synchronization of the storage devices and assume all data has been erase to zero on the device.
nosync: true
# Stripe count is the number of devices or physical volumes to be used by the logical volume created for the PVC.
stripecount: 2
# Stripe size is the amount of data that is written to one device before moving to the next device.
# The value must be an integer of bytes optionally followed by a unit multiplier with no space.
stripesize: 64Ki
# LVM2 create options such as --vdo and --readahead auto will be applied when the logical volume is created for the PVC.
# No validation is performed on this option string so any errors or misunderstandings of LVM2 could cause failures to create PVCs.
lvcreateoptions: "--vdo --readahead auto"
When coding the appending of lvcreateoptions we will need to make sure to strip off special characters like "|" and ";" for security reasons.
Ok, that makes sense! Do we want to group the options under a top-level raid
option, like my previous post? Some of the options could be at the top level (like lvCreateOptions
), but most of the other options are very specific to RAID. Regardless, I'll start working on this and make a PR. Once we have that, we can discuss specifics there.
Thanks for trying to take this on @nicholascioli -- I am actually not using lvm-localpv
anymore but it's great to see someone take a stab at this.
As a quick aside, how should I handle if someone has requested a thinVolume group with RAID? I think that lvm2 does not allow that, as the --type
flag does not support multiple types at the same time.
Philosophically, I don't like the idea of the CSI driver trying to enforce LVM2 constraints. What if LVM2 in RedHat 8.8 behaves differently than RedHat 9.2? My opinion is to document that the LVM2 configuration of the OS is the source of true for what options can be used on a give node. When the CSI driver doesn't get good status from the lvcreate call then it should log the lvcreate command arguments and stderr so the stuck PV create can be debugged.