What happened?

An user with permissions to create or modify StorageClass may print any Secret.

The problem comes from a klog print which is not sanitized:

	// Note that secrets are not included in any RPC message. In the past protosanitizer and other log
	// stripping was shown to cause a significant increase of CPU usage (see
	// https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/356#issuecomment-550529004).
	klog.V(4).Infof("%s called with request: %s", info.FullMethod, req)

The comment says that secrets are not included in any RPC message, however there many RPC messages that might include secrets. For example NodeExpandVolume as explained here. If one can add:

parameters:
  csi.storage.k8s.io/node-expand-secret-name: test-secret

to the StorageClass, then after PV resize one can see in logs:

I0915 03:09:16.116829       1 utils.go:66] /csi.v1.Node/NodeExpandVolume called with request: volume_id:"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa" volume_path:"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount" capacity_range:<required_bytes:3221225472 > staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:"password" value:"t0p-Secret0" > secrets:<key:"username" value:"admin" >

Note that there are 7 more RPC message types with secrets listed here.

GCP CSI driver does not seem to use these features, so the threat may come only from malicious user with elevated permissions who intentinally want to expose secrets.

What did you expect to happen?

Log print to be sanitized. E.g.:

I0915 03:29:10.005165       1 utils.go:81] /csi.v1.Node/NodeExpandVolume called with request: {"capacity_range":{"required_bytes":4294967296},"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa","volume_path":"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount"}

How can we reproduce it?

Create secret

apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  namespace: default
data:
stringData:
  username: admin
  password: t0p-Secret

Create sc

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: test5
parameters:
  csi.storage.k8s.io/node-expand-secret-name: test-secret
  csi.storage.k8s.io/node-expand-secret-namespace: default
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

Create pvc/pod
Update csi driver log to Debug
Resize the volume capacity
Check csi driver logs

I0915 03:09:16.116829       1 utils.go:66] /csi.v1.Node/NodeExpandVolume called with request: volume_id:"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa" volume_path:"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount" capacity_range:<required_bytes:3221225472 > staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:"password" value:"t0p-Secret0" > secrets:<key:"username" value:"admin" >

Anything else we need to know?

n/a

/cc @msau42 , @mattcary , @cjcullen, @tallclair, @jsafrane

Sep 15 '23 03:09 mpatlasov

/assign

I have a patch to sanitize specific RPC req types, but before opening PR I'd like to get feedback on issue severity. The severity is probably depends on possibility to have permissions high enough to create or modify StorageClass, but not as high as cluster admin who can print secrets in a simpler way.

Sep 15 '23 04:09 mpatlasov

We discussed it on today's sig-storage triage meeting and we agreed that it should be fixed by sanitizing requests, but only when debug log level is enabled:

  if klog.V(4).Enabled() {
    klog.Infof("%s called with request: %s", info.FullMethod, sanitize(req))
  }

Then it won't get called in production or wherever performance is required.

Sep 20 '23 17:09 jsafrane

For production we actually use log level 4. The pdcsi driver does not actually require any secrets though? Maybe we need to look at ways to disallow/ignore secrets being passed in the first place.

Sep 20 '23 18:09 msau42

Yes, the CSI driver does not need any secrets, it just logs them and then ignores them. Currently it's not possible to tell CSI sidecars / kubelet what secrets the driver actually uses and I am not sure it's the right direction - it could be error prone to adding new method / secret.

Sep 21 '23 08:09 jsafrane

If there was a CSIDriver field that said whether the driver used secrets at all, then tere could be eg an admission webhook that only let known drivers exposing secrets to be installed. (eg, this driver doesn't use secrets at all).

I understand there'd be backwards compatibility issues with this, but it's an idea.

Sep 21 '23 15:09 mattcary

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 28 '24 20:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 27 '24 21:02 k8s-triage-robot

/remove-lifecycle rotten

Mar 09 '24 00:03 msau42

gcp-compute-persistent-disk-csi-driver
gcp-compute-persistent-disk-csi-driver copied to clipboard

Driver can display the secret content of node-expand-secret-name

What happened?

What did you expect to happen?

How can we reproduce it?

Anything else we need to know?

gcp-compute-persistent-disk-csi-driver gcp-compute-persistent-disk-csi-driver copied to clipboard

Driver can display the secret content of node-expand-secret-name

What happened?

What did you expect to happen?

How can we reproduce it?

Anything else we need to know?

gcp-compute-persistent-disk-csi-driver
gcp-compute-persistent-disk-csi-driver copied to clipboard