ceph-csi icon indicating copy to clipboard operation
ceph-csi copied to clipboard

nvmeof: DH-CHAP Authentication for NVMe-oF CSI Driver

Open gadididi opened this issue 1 month ago • 3 comments

Describe the feature you'd like to have

Add DH-HMAC-CHAP (Diffie-Hellman Challenge Handshake Authentication Protocol) authentication support to the Ceph NVMe-oF CSI driver, enabling secure authentication between Kubernetes nodes and NVMe-oF storage subsystems. This feature is required by the PM.

What is DH-CHAP?

DH-CHAP is a security protocol defined in the NVMe specification that authenticates connections between hosts (initiators) and storage controllers. Think of it as requiring a password before allowing access to storage, but cryptographically secure.

Key Concepts

  • Host (Initiator): The Kubernetes worker node that wants to access storage
  • Subsystem (Target/Controller): The storage system exposing NVMe volumes through the gateway
  • DH-CHAP Key: A cryptographic secret used to prove identity during authentication
  • Bidirectional Authentication: Both the host and storage controller prove their identities to each other (extremely recommended for production)
  • Unidirectional Authentication: Only the host proves its identity to the storage controller

What is the value to the end user? (why is it a priority?)

Security Benefits

  1. Multi-Tenant Isolation: In shared Kubernetes clusters, DH-CHAP ensures that workloads from one tenant cannot access storage belonging to another tenant, even if they share the same physical network.

  2. Man-in-the-Middle Protection: Bidirectional authentication ensures that hosts connect to legitimate storage controllers, not malicious imposters on the network.

How will we know we have a good solution? (acceptance criteria)

Functional Requirements

1. StorageClass Configuration

  • Support enable_dhchap: "true/false" parameter to enable/disable authentication
  • Support dhchap_mode: "bidirectional/unidirectional" parameter to select authentication mode
  • Backward compatible: existing StorageClasses without DH-CHAP continue to work

2. Key Management

  • Automatically generate cryptographically secure DH-CHAP keys
  • Store keys securely in Kubernetes Secrets
  • One unique key per node-subsystem connection (isolation)
  • One subsystem key per subsystem (for bidirectional mode)
  • Automatic key cleanup when resources are deleted

3. Volume Lifecycle Operations (when enable_dhchap: "true")

  • CreateVolume: Generate subsystem key (if bidirectional mode) and create subsystem with authentication
  • ControllerPublishVolume: Generate/retrieve host key and add host to gateway with authentication
  • NodeStageVolume: Retrieve keys from secrets and connect with nvme connect --dhchap-secret command
  • ControllerUnpublishVolume: Remove host and delete host key when last namespace detached
  • DeleteVolume: Delete subsystem key when last namespace deleted

4. Authentication Modes

  • Bidirectional: Host and subsystem both authenticate (mutual authentication)
  • Unidirectional: Only host authenticates to subsystem
  • None: No authentication (backward compatibility)

5. Error Handling

  • Clear error messages when authentication fails
  • Graceful handling of missing or invalid keys

Non-Functional Requirements

6. Security

  • Keys generated using cryptographically secure random number generators
  • Keys never logged or exposed in error messages
  • Secrets use restrictive RBAC permissions
  • Different keys for different connections (no key reuse across subsystems)

7. Compatibility

  • Compatible with Linux kernel NVMe driver (nvme-tcp module)
  • No breaking changes to existing CSI API

Additional context

1. NOTE: there is an option to update the dh-chap key for host\subsystem. This option is not presented here. We need to talk about it how we want to handle the updating option. it is not part in this phase.

Architecture Highlights

Key Design Decisions

  1. One Key Per Node-Subsystem Connection: Each unique node-subsystem pair gets its own DH-CHAP key. This provides:

    • Better security isolation (compromised key only affects one connection)
    • Simple cleanup logic (delete key when connection removed)
    • Per-subsystem access control
  2. Key Persistence: Keys are stored in Kubernetes Secrets and persist across pod restarts, ensuring consistent authentication without regenerating keys unnecessarily.

Implementation References

  • Ceph NVMe-oF Gateway: Supports DH-CHAP via subsystem add --dhchap-key and host add --dhchap-key commands
  • Linux nvme-cli: Supports DH-CHAP via nvme connect with --dhchap-secret and --dhchap-ctrl-secret options

Example StorageClass Configuration

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-nvmeof-secure
provisioner: nvmeof.csi.ceph.com
parameters:
  # Existing parameters
  subsystemNQN: "nqn.2024-01.io.ceph:csi"
  nvmeofGatewayAddress: "192.168.1.100"
  nvmeofGatewayPort: "5500"
  listeners: '[{"address":"192.168.1.100","port":4420,"hostname":"gw1"}]'
  
  # NEW: DH-CHAP authentication
  enable_dhchap: "true"
  dhchap_mode: "bidirectional"  # Recommended for production

Security Considerations

  • Key Rotation: Future enhancement to support periodic key rotation using gateway's change_host_key and change_subsystem_key APIs
  • Key Storage: Consider integrating with external key management systems (HashiCorp Vault, KMS??) for enhanced key protection
  • Audit Logging: Authentication events should be logged for security monitoring and compliance

Sequence Diagram:

Image

gadididi avatar Nov 04 '25 16:11 gadididi

Image

A single optional dhchap_mode option with values none (default), bidirectional, unidirectional is cleaner.

nixpanic avatar Nov 06 '25 16:11 nixpanic

got few question, i have noted down when started reading it, few might have been answered but noting here as well

  1. Can this be enabled/disabled on the fly?
  2. Can this be added to configmap instead of SC?
  3. Store keys securely in Kubernetes Secrets? Instead of this can we use KMS or image metadata for it? Like we do for PV encryption as it will be more secure? 4 One unique key per node-subsystem connection (isolation) is this Key is going to be per node or per PV?
    5 CreateVolume: Generate subsystem key (if bidirectional mode) and create subsystem with authentication what about the case where Key need to be rotated?
  4. None: No authentication (backward compatibility) IMO we dont need to worry about backward as its still in development phase. 7.Key Persistence: Keys are stored in Kubernetes Secrets and persist across pod restarts, ensuring consistent authentication without regenerating keys unnecessarily. IMO it would be good to consider external entity instead of we managing the keys. and also we need to think about key rotation in early phase and does current design make sense.

Madhu-1 avatar Nov 10 '25 09:11 Madhu-1

@Madhu-1 Hi!

got few question, i have noted down when started reading it, few might have been answered but noting here as well

  1. Can this be enabled/disabled on the fly?

Subsystem keys and Hosts keys can be added/removed/changed dynamically via change_subsystem_key_req and change_host_key_req. but if you change the key for subsystem\host, then your worker node restart you must reconnect with the new key. it is highly recommended to disconnect and reconnect again with the new keys.

if we would like to support the key modification option, I am not sure which CSI function can fit this?

Can this be added to configmap instead of SC?

It should be per subsystem. you may have subsystem with DHCHAP and some without. in the SC we declare the SubsystemNQN var, So I thought it should be placed there, why do you think to move it into configmap?

4 One unique key per node-subsystem connection (isolation) is this Key is going to be per node or per PV?

There is 1 key per subsystem, and 1 key per host (worker node). So, per node-subsystem connection, NOT per PV

Instead of this can we use KMS or image metadata for it? Like we do for PV encryption as it will be more secure?

I thought to keep both keys encrypted somewhere, I need to learn and understand how it is working in k8s and other env like that.. also, where to keep the kek (the key we will encrypt with him both dhchap keys) .

5 CreateVolume: Generate subsystem key (if bidirectional mode) and create subsystem with authentication what about the case where Key need to be rotated?

There is an option to update the keys. you are right, not keep same key forever and every X days\hours change that key..

IMO it would be good to consider external entity instead of we managing the keys.

do you have any suggestion for it? I will investigate too

gadididi avatar Dec 10 '25 12:12 gadididi