nvmeof: QoS Support for NVMe-oF CSI Driver
QoS Support for NVMe-oF CSI Driver
Overview
Add QoS (Quality of Service) support for NVMe-oF namespaces, allowing users to control IOPS and bandwidth limits both at volume creation and during runtime.
Proposed Implementation
1. QoS at Volume Creation (StorageClass)
Set initial QoS limits via StorageClass parameters. If omitted, namespaces remain unlimited.
Example StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-nvmeof-standard
provisioner: nvmeof.csi.ceph.com
parameters:
pool: mypool
# Optional QoS parameters
qosRwIopsPerSecond: "10000"
qosRwMegabytesPerSecond: "100"
qosReadMegabytesPerSecond: "150"
qosWriteMegabytesPerSecond: "50"
Implementation: Modify ControllerCreateVolume() to:
- Parse QoS parameters from StorageClass
- Call
ns set_qosAPI after namespace creation (if parameters exist) - Handle missing parameters gracefully (no QoS = unlimited)
2. Runtime QoS Modification (VolumeAttributesClass)
Enable QoS changes on existing volumes without recreation using CSI ControllerModifyVolume().
Example VolumeAttributesClass:
apiVersion: storage.k8s.io/v1beta1
kind: VolumeAttributesClass
metadata:
name: high-performance
driverName: nvmeof.csi.ceph.com
parameters:
qosRwIopsPerSecond: "50000"
qosRwMegabytesPerSecond: "500"
Now you want to hook the pvc with the VAC, you should do: Usage:
# Apply QoS to existing PVC
kubectl patch pvc my-pvc -p '{"spec":{"volumeAttributesClassName":"high-performance"}}'
Implementation: Add ControllerModifyVolume() RPC to:
- Parse QoS parameters from VolumeAttributesClass
- Call
ns set_qosvia GRPC to the GW - Deploy
csi-resizersidecar to monitor VAC changes
Supported QoS Parameters
All parameters map directly to NVMe-oF gateway ns set_qos command:
qosRwIopsPerSecond- R/W IOPS limit (0 = unlimited)qosRwMegabytesPerSecond- R/W bandwidth limitqosReadMegabytesPerSecond- Read bandwidth limitqosWriteMegabytesPerSecond- Write bandwidth limit
Requirements
- Kubernetes 1.29+ (for VolumeAttributesClass support)
- CSI spec 1.9.0+ (for ControllerModifyVolume)
MODIFY_VOLUMEcontroller capabilitycsi-resizersidecar container
@gadididi Is nvme-of QOS keys are similar to nbd QOS keys, we already have QOS for nbd https://github.com/ceph/ceph-csi/blob/72c09d3d8758d058575d34b2da4b09eb0a591f8f/examples/rbd/storageclass.yaml#L168-L217, can we have same keys or similar keys so that we can use lot of internal functions and it will be easy for users as well so that they will have same keys in SC.
@Madhu-1 Hi, sure if we can use common code is good. the current params for QoS for NVMe-oF are:
Set QOS limits for a namespace
optional arguments:
-h, --help show this help message and exit
--subsystem SUBSYSTEM, -n SUBSYSTEM
Subsystem NQN
--nsid NSID Namespace ID
--rw-ios-per-second RW_IOS_PER_SECOND
R/W IOs per second limit, 0 means unlimited
--rw-megabytes-per-second RW_MEGABYTES_PER_SECOND
R/W megabytes per second limit, 0 means unlimited
--r-megabytes-per-second R_MEGABYTES_PER_SECOND
Read megabytes per second limit, 0 means unlimited
--w-megabytes-per-second W_MEGABYTES_PER_SECOND
Write megabytes per second limit, 0 means unlimited
--force Set QOS limits even if they were changed by RBD
there is param named force . I need to check what is the consequences of using the same keys in the SC, and let you know.
there is also a requirement to modify the volume(=nvmeof ns) "on the fly" , so do you think ControllerModifyVolume() is proper solution for it?
so do you think ControllerModifyVolume() is proper solution for it?
Yes thats correct, we can support changing the QOS without any requirement like remount or any node operations, its the way to go.
@Madhu-1 Hi!!,
Response: Should we reuse RBD QoS keys for NVMe-oF?
After looking into this more deeply, I don't think we should reuse the RBD QoS keys for NVMe-oF. Here's why:
The RBD QoS parameters like baseIops, maxIops, and iopsPerGiB are designed for a capacity-based calculation model where the QoS limits scale dynamically with the volume size. This makes sense for RBD because the QoS is applied at the image level in the storage backend.
NVMe-oF gateway QoS works completely differently. It's applied at the network/SPDK layer and just takes static absolute values - there's no calculation or scaling involved. You just tell it "limit this to 10000 IOPS" and that's what it does, regardless of volume size.
More importantly, these two QoS mechanisms don't actually work well together. If an RBD image already has QoS configured, the NVMe-oF gateway QoS won't do anything unless you use the --force flag, which isn't recommended. They're operating at different layers and can conflict with each other.
So for NVMe-oF, I think we should use simple, descriptive parameters like nvmeofRwIopsPerSecond and nvmeofRwMegabytesPerSecond. The implementation is straightforward - we just pass these values directly to the gateway API (via the GRPC) without any calculation. Different keys will also make it clear to users that this is a different QoS mechanism.
What do you think?
@Madhu-1 Hi!!,
Response: Should we reuse RBD QoS keys for NVMe-oF?
After looking into this more deeply, I don't think we should reuse the RBD QoS keys for NVMe-oF. Here's why:
The RBD QoS parameters like baseIops, maxIops, and iopsPerGiB are designed for a capacity-based calculation model where the QoS limits scale dynamically with the volume size. This makes sense for RBD because the QoS is applied at the image level in the storage backend.
NVMe-oF gateway QoS works completely differently. It's applied at the network/SPDK layer and just takes static absolute values - there's no calculation or scaling involved. You just tell it "limit this to 10000 IOPS" and that's what it does, regardless of volume size.
More importantly, these two QoS mechanisms don't actually work well together. If an RBD image already has QoS configured, the NVMe-oF gateway QoS won't do anything unless you use the --force flag, which isn't recommended. They're operating at different layers and can conflict with each other.
So for NVMe-oF, I think we should use simple, descriptive parameters like nvmeofRwIopsPerSecond and nvmeofRwMegabytesPerSecond. The implementation is straightforward - we just pass these values directly to the gateway API (via the GRPC) without any calculation. Different keys will also make it clear to users that this is a different QoS mechanism.
What do you think?
@gadididi Thanks for the details explanation, Make sense as both are completely different implementation and uses different keys, we can have different keys in the SC for this driver
A few more things to do once #5614 is merged:
- add example yaml files for a
VolumeAttributeClasswith QoS limits and one that removes the limits - add e2e testing, depends on #5641
- document the feature, and it's dependency on Kubernetes 1.34 (and maybe kubernetes-csi/external-provisioner#1440 and kubernetes-csi/external-resizer#544)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.