gcp-compute-persistent-disk-csi-driver
gcp-compute-persistent-disk-csi-driver copied to clipboard
Driver can display the secret content of node-expand-secret-name
What happened?
An user with permissions to create or modify StorageClass
may print any Secret
.
The problem comes from a klog
print which is not sanitized:
// Note that secrets are not included in any RPC message. In the past protosanitizer and other log
// stripping was shown to cause a significant increase of CPU usage (see
// https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/356#issuecomment-550529004).
klog.V(4).Infof("%s called with request: %s", info.FullMethod, req)
The comment says that secrets are not included in any RPC message, however there many RPC messages that might include secrets. For example NodeExpandVolume
as explained here. If one can add:
parameters:
csi.storage.k8s.io/node-expand-secret-name: test-secret
to the StorageClass
, then after PV resize one can see in logs:
I0915 03:09:16.116829 1 utils.go:66] /csi.v1.Node/NodeExpandVolume called with request: volume_id:"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa" volume_path:"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount" capacity_range:<required_bytes:3221225472 > staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:"password" value:"t0p-Secret0" > secrets:<key:"username" value:"admin" >
Note that there are 7 more RPC message types with secrets listed here.
GCP CSI driver does not seem to use these features, so the threat may come only from malicious user with elevated permissions who intentinally want to expose secrets.
What did you expect to happen?
Log print to be sanitized. E.g.:
I0915 03:29:10.005165 1 utils.go:81] /csi.v1.Node/NodeExpandVolume called with request: {"capacity_range":{"required_bytes":4294967296},"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa","volume_path":"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount"}
How can we reproduce it?
- Create secret
apiVersion: v1
kind: Secret
metadata:
name: test-secret
namespace: default
data:
stringData:
username: admin
password: t0p-Secret
- Create sc
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: test5
parameters:
csi.storage.k8s.io/node-expand-secret-name: test-secret
csi.storage.k8s.io/node-expand-secret-namespace: default
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
-
Create pvc/pod
-
Update csi driver log to Debug
-
Resize the volume capacity
-
Check csi driver logs
I0915 03:09:16.116829 1 utils.go:66] /csi.v1.Node/NodeExpandVolume called with request: volume_id:"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa" volume_path:"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount" capacity_range:<required_bytes:3221225472 > staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:"password" value:"t0p-Secret0" > secrets:<key:"username" value:"admin" >
Anything else we need to know?
n/a
/cc @msau42 , @mattcary , @cjcullen, @tallclair, @jsafrane
/assign
I have a patch to sanitize specific RPC req types, but before opening PR I'd like to get feedback on issue severity. The severity is probably depends on possibility to have permissions high enough to create or modify StorageClass, but not as high as cluster admin who can print secrets in a simpler way.
We discussed it on today's sig-storage triage meeting and we agreed that it should be fixed by sanitizing requests, but only when debug log level is enabled:
if klog.V(4).Enabled() {
klog.Infof("%s called with request: %s", info.FullMethod, sanitize(req))
}
Then it won't get called in production or wherever performance is required.
For production we actually use log level 4. The pdcsi driver does not actually require any secrets though? Maybe we need to look at ways to disallow/ignore secrets being passed in the first place.
Yes, the CSI driver does not need any secrets, it just logs them and then ignores them. Currently it's not possible to tell CSI sidecars / kubelet what secrets the driver actually uses and I am not sure it's the right direction - it could be error prone to adding new method / secret.
If there was a CSIDriver field that said whether the driver used secrets at all, then tere could be eg an admission webhook that only let known drivers exposing secrets to be installed. (eg, this driver doesn't use secrets at all).
I understand there'd be backwards compatibility issues with this, but it's an idea.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten