mondoo-operator icon indicating copy to clipboard operation
mondoo-operator copied to clipboard

The operator fails to scan GKE autopilot clusters

Open czunker opened this issue 1 year ago • 4 comments

Describe the bug When the operator is deployed in a GKE autopilot cluster, it does not report any assets.

https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview

This needs to be fixed because the new default for GKE clusters is autopilot: https://cloud.google.com/blog/products/containers-kubernetes/gke-autopilot-is-now-default-mode-of-cluster-operation

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Select '....'
  3. Scroll down to '....'
  4. Note the error

Expected behavior The operator should scan the same workloads as in other clusters.

czunker avatar Aug 30 '23 06:08 czunker

Seems that the problem is that we have this volume

Volumes: []corev1.Volume{
  {
    Name: "root",
    VolumeSource: corev1.VolumeSource{
      HostPath: &corev1.HostPathVolumeSource{Path: "/", Type: &unsetHostPath},
    },
  },

Which gets mounted here:

VolumeMounts: []corev1.VolumeMount{
  {
    Name:      "root",
    ReadOnly:  true,
    MountPath: "/mnt/host/",
  },

Which causes the following error

hostPath volume root used in container cnspec uses path / which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

I see that we are currently relying on this volume for scanning

		Spec: &v1.InventorySpec{
			Assets: []*asset.Asset{
				{
					Id:   "host",
					Name: node.Name,
					Connections: []*providers.Config{
						{
							Host:       "/mnt/host",
							Backend:    providers.ProviderType_FS,
							PlatformId: fmt.Sprintf("//platformid.api.mondoo.app/runtime/k8s/uid/%s/node/%s", clusterUID, node.UID),
						},
					},
					Labels: map[string]string{
						"k8s.mondoo.com/kind": "node",
					},
					ManagedBy: "mondoo-operator-" + clusterUID,
				},
			},
		},
	}

I don't have a solution currenlty, just wanted to share my insights from taking a first look at this problem.

I assume we should at least be able to build better handling around this case as this currently prevents the cronjob from being created at all.

mariuskimmina avatar Sep 09 '23 07:09 mariuskimmina

Seeing that if node scanning fails we just stop and don't even attempt to scan kubernetes ressources - I wonder if for autopilot clusters we could be fine with being unable to scan the nodes (currently I don't see a way to make that work) but still scan the resources in the cluster

nodes := nodes.DeploymentHandler{
  Mondoo:                 mondooAuditConfig,
  KubeClient:             r.Client,
  MondooOperatorConfig:   config,
  ContainerImageResolver: r.ContainerImageResolver,
  IsOpenshift:            r.RunningOnOpenShift,
}

result, reconcileError = nodes.Reconcile(ctx)
if reconcileError != nil {
  log.Error(reconcileError, "Failed to set up nodes scanning")
}
if reconcileError != nil || result.Requeue {
  return result, reconcileError
}

workloads := k8s_scan.DeploymentHandler{
  Mondoo:                 mondooAuditConfig,
  KubeClient:             r.Client,
  MondooOperatorConfig:   config,
  ContainerImageResolver: r.ContainerImageResolver,
  ScanApiStore:           r.ScanApiStore,
}

result, reconcileError = workloads.Reconcile(ctx)
if reconcileError != nil {
  log.Error(reconcileError, "Failed to set up Kubernetes resources scanning")
}
if reconcileError != nil || result.Requeue {
  return result, reconcileError
}

mariuskimmina avatar Sep 09 '23 07:09 mariuskimmina

Turning off node scanning already allows the resources to show up

image

mariuskimmina avatar Sep 09 '23 07:09 mariuskimmina

Thanks @mariuskimmina, for digging deeper into this.

It's good to know what to look for. Perhaps other ways to scan a now work in GKE autopilot.

czunker avatar Sep 11 '23 04:09 czunker