provider-gcp icon indicating copy to clipboard operation
provider-gcp copied to clipboard

issue with spec.forProvider.addonsConfig.networkPolicyConfig when using provider-gcp

Open jesumyip opened this issue 4 years ago • 3 comments

Environment:

  1. Cluster version: 1.20.9-gke.1001
  2. Using gcloud binary in Google Cloud Shell.
  3. Crossplane version 1.4.1.
  4. Crossplane provider-gcp version 0.18.0.

Description of problem:

  1. Create a nodepool + GKE cluster using the following YAML manifests:
apiVersion: container.gcp.crossplane.io/v1beta2
kind: Cluster
metadata:
  name: gke-crossplane-cluster4
spec:
  providerConfigRef:
    name: default
  forProvider:
    networkConfig:
      enableIntraNodeVisibility: true
    loggingService: logging.googleapis.com/kubernetes
    monitoringService: monitoring.googleapis.com/kubernetes
    network: "xxxxxxxxx"
    subnetwork: "xxxxxxxx"
    addonsConfig:
      gcePersistentDiskCsiDriverConfig:
        enabled: true
      dnsCacheConfig:
        enabled: false
      horizontalPodAutoscaling:
        disabled: true
      httpLoadBalancing:
        disabled: false
      networkPolicyConfig:
        disabled: false
    location: asia-southeast1
    binaryAuthorization: 
      enabled: false
    legacyAbac:
      enabled: false
    masterAuth:
      clientCertificateConfig:
        issueClientCertificate: false
---
apiVersion: container.gcp.crossplane.io/v1beta1
kind: NodePool
metadata:
  name: gke-crossplane-nodepool4
spec:
  providerConfigRef:
    name: default
  forProvider:
    autoscaling:
      autoprovisioned: false
      enabled: true
      maxNodeCount: 2
      minNodeCount: 1
    clusterRef:
      name: gke-crossplane-cluster4
    config:
      diskSizeGb: 100
      diskType: pd-standard
      imageType: cos_containerd
      labels:
        test-label: crossplane-created
      machineType: e2-medium
      oauthScopes:
        - "https://www.googleapis.com/auth/devstorage.read_only"
        - "https://www.googleapis.com/auth/logging.write"
        - "https://www.googleapis.com/auth/monitoring"
        - "https://www.googleapis.com/auth/servicecontrol"
        - "https://www.googleapis.com/auth/service.management.readonly"
        - "https://www.googleapis.com/auth/trace.append"
    initialNodeCount: 2
    locations:
      - xxxxxxxxxx
    management:
      autoRepair: true
      autoUpgrade: true

  1. Cluster and nodepool creation are successful. However, cluster status is stuck at "RECONCILING" for a very long time (did not test how long, I let it stay this way for > 30minutes before I abandoned the test).

image

  1. If I modify the cluster manifest and change spec.forProvider.addonsConfig.networkPolicyConfig.disabled = true, and recreate the cluster + nodepool, the problem goes away. Cluster becomes normal after approximately 6-7minutes.

  2. Next, I tried creating the cluster with spec.forProvider.addonsConfig.networkPolicyConfig.disabled = true and wait till the cluster is healthy.

  3. I then manually tried to enable networkPolicyConfig via: gcloud container clusters update gke-crossplane-cluster4 --update-addons=NetworkPolicy=ENABLED --zone xxxxxx as per instructions at https://cloud.google.com/kubernetes-engine/docs/how-to/network-policy image the command did not return for a very long time, and I hit ctrl-c.

  4. Upon running kubectl describe cluster gke-crossplane-cluster4, I noticed the problem returns. The cluster is once again in a constant state of being reconciled.

It appears as though GCP is accepting networkPolicyConfig being enabled and this is causing the cluster to be stuck in a constant state of being reconciled. Is this a provider-gcp issue or a GCP issue?

jesumyip avatar Oct 01 '21 11:10 jesumyip

@jesumyip thanks a lot for reporting and the detailed steps!

Since it is also happening with gcloud cli I would suspect this to be GCP issue. I am wondering if this is specific to a k8s version?

turkenh avatar Oct 01 '21 15:10 turkenh

Repeated the test with spec.forProvider.initialCluster = 1.18, and spec.forProvider.addonsConfig.networkPolicyConfig.disabled = true

apiVersion: container.gcp.crossplane.io/v1beta2
kind: Cluster
metadata:
  name: gke-crossplane-cluster4
spec:
  providerConfigRef:
    name: default
  forProvider:
    initialClusterVersion: "1.18"
    networkConfig:
      enableIntraNodeVisibility: true
    loggingService: logging.googleapis.com/kubernetes
    monitoringService: monitoring.googleapis.com/kubernetes
    network: "xxxxxxxxxxxx"
    subnetwork: "xxxxxxxxxxxxxx"
    addonsConfig:
      gcePersistentDiskCsiDriverConfig:
        enabled: true
      dnsCacheConfig:
        enabled: false
      horizontalPodAutoscaling:
        disabled: true
      httpLoadBalancing:
        disabled: false
      networkPolicyConfig:
        disabled: true
    location: xxxxxxxx
    binaryAuthorization: 
      enabled: false
    legacyAbac:
      enabled: false
    masterAuth:
      clientCertificateConfig:
        issueClientCertificate: false

Cluster becomes normal after about 6minutes: image

Then, execute gcloud container clusters update gke-crossplane-cluster4 --update-addons=NetworkPolicy=ENABLED --zone xxxxxxx and left this running for 9 minutes. It never returned, so I stopped it.

image

At this point, cluster is stuck in "RECONCILING" state. Also, I cannot enable network policy on the nodes because you first have to enable it on the cluster.

image

In conclusion, v1.20 and 1.18 of K8s = same result.

If this is a GCP issue, I guess I just have to remember not to enable network policies.

jesumyip avatar Oct 02 '21 04:10 jesumyip

Enabling the network policy using GCP's web GUI also produces the same result.

image

image

image

I have filed an issue report with Google.

jesumyip avatar Oct 02 '21 04:10 jesumyip