gcp-compute-persistent-disk-csi-driver icon indicating copy to clipboard operation
gcp-compute-persistent-disk-csi-driver copied to clipboard

Driver can display the secret content of node-expand-secret-name

Open mpatlasov opened this issue 1 year ago • 12 comments

What happened?

An user with permissions to create or modify StorageClass may print any Secret.

The problem comes from a klog print which is not sanitized:

	// Note that secrets are not included in any RPC message. In the past protosanitizer and other log
	// stripping was shown to cause a significant increase of CPU usage (see
	// https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/356#issuecomment-550529004).
	klog.V(4).Infof("%s called with request: %s", info.FullMethod, req)

The comment says that secrets are not included in any RPC message, however there many RPC messages that might include secrets. For example NodeExpandVolume as explained here. If one can add:

parameters:
  csi.storage.k8s.io/node-expand-secret-name: test-secret

to the StorageClass, then after PV resize one can see in logs:

I0915 03:09:16.116829       1 utils.go:66] /csi.v1.Node/NodeExpandVolume called with request: volume_id:"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa" volume_path:"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount" capacity_range:<required_bytes:3221225472 > staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:"password" value:"t0p-Secret0" > secrets:<key:"username" value:"admin" > 

Note that there are 7 more RPC message types with secrets listed here.

GCP CSI driver does not seem to use these features, so the threat may come only from malicious user with elevated permissions who intentinally want to expose secrets.

What did you expect to happen?

Log print to be sanitized. E.g.:

I0915 03:29:10.005165       1 utils.go:81] /csi.v1.Node/NodeExpandVolume called with request: {"capacity_range":{"required_bytes":4294967296},"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa","volume_path":"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount"}

How can we reproduce it?

  1. Create secret
apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  namespace: default
data:
stringData:
  username: admin
  password: t0p-Secret
  1. Create sc
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: test5
parameters:
  csi.storage.k8s.io/node-expand-secret-name: test-secret
  csi.storage.k8s.io/node-expand-secret-namespace: default
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
  1. Create pvc/pod

  2. Update csi driver log to Debug

  3. Resize the volume capacity

  4. Check csi driver logs

I0915 03:09:16.116829       1 utils.go:66] /csi.v1.Node/NodeExpandVolume called with request: volume_id:"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa" volume_path:"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount" capacity_range:<required_bytes:3221225472 > staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:"password" value:"t0p-Secret0" > secrets:<key:"username" value:"admin" > 

Anything else we need to know?

n/a

/cc @msau42 , @mattcary , @cjcullen, @tallclair, @jsafrane

mpatlasov avatar Sep 15 '23 03:09 mpatlasov

/assign

I have a patch to sanitize specific RPC req types, but before opening PR I'd like to get feedback on issue severity. The severity is probably depends on possibility to have permissions high enough to create or modify StorageClass, but not as high as cluster admin who can print secrets in a simpler way.

mpatlasov avatar Sep 15 '23 04:09 mpatlasov

We discussed it on today's sig-storage triage meeting and we agreed that it should be fixed by sanitizing requests, but only when debug log level is enabled:

  if klog.V(4).Enabled() {
    klog.Infof("%s called with request: %s", info.FullMethod, sanitize(req))
  }

Then it won't get called in production or wherever performance is required.

jsafrane avatar Sep 20 '23 17:09 jsafrane

For production we actually use log level 4. The pdcsi driver does not actually require any secrets though? Maybe we need to look at ways to disallow/ignore secrets being passed in the first place.

msau42 avatar Sep 20 '23 18:09 msau42

Yes, the CSI driver does not need any secrets, it just logs them and then ignores them. Currently it's not possible to tell CSI sidecars / kubelet what secrets the driver actually uses and I am not sure it's the right direction - it could be error prone to adding new method / secret.

jsafrane avatar Sep 21 '23 08:09 jsafrane

If there was a CSIDriver field that said whether the driver used secrets at all, then tere could be eg an admission webhook that only let known drivers exposing secrets to be installed. (eg, this driver doesn't use secrets at all).

I understand there'd be backwards compatibility issues with this, but it's an idea.

mattcary avatar Sep 21 '23 15:09 mattcary

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 28 '24 20:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 27 '24 21:02 k8s-triage-robot

/remove-lifecycle rotten

msau42 avatar Mar 09 '24 00:03 msau42