cluster-api-provider-cloud-director icon indicating copy to clipboard operation
cluster-api-provider-cloud-director copied to clipboard

Some API call are not using https_proxy/no_proxy set at controller pod level.

Open FrancoisKlieberOrange opened this issue 1 year ago • 2 comments

Describe the bug

It appears that not all calls made by cluster-api-provider-cloud-director are utilizing the proxy settings defined in the environment variables.

Environment Details:

  • We have a cluster deployed in an environment that requires a proxy to connect to Cloud Director.
  • All Cluster-API providers are deployed within this environment.
  • The deployment of cluster-api-provider-cloud-director has been updated with https_proxy and no_proxy environment variables.

Here is the deployment configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    cluster.x-k8s.io/provider: infrastructure-vcd
    clusterctl.cluster.x-k8s.io: ""
    control-plane: controller-manager
  name: capvcd-controller-manager
  namespace: capvcd-system
spec:

  replicas: 1
  selector:
    matchLabels:
      cluster.x-k8s.io/provider: infrastructure-vcd
      control-plane: controller-manager
  template:
    metadata:
      labels:
        cluster.x-k8s.io/provider: infrastructure-vcd
        control-plane: controller-manager
    spec:
      containers:
      - command:
        - /opt/vcloud/bin/cluster-api-provider-cloud-director
        env:
        - name: https_proxy
          value: <proxy settings>
        - name: no_proxy
          value: localhost,.svc,.cluster.local,.svc.cluster.local,<pod cidr>,<service cidr>
        image: projects.registry.vmware.com/vmware-cloud-director/cluster-api-provider-cloud-director:v1.3.0
        imagePullPolicy: IfNotPresent
        

Issue Observed:

During the load balancer creation step, the request is being resolved by CoreDNS (10.96.0.10:53) instead of using the proxy. Below is the error message encountered:

Reconciler error    
{"controller": "vcdcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "error": "failed to create gateway manager using the workload client to reconcile cluster [<cluster name>]: error caching gateway related details: [unable to get OVDC network [<network name>]: [unable to get all ovdc networks: [<nil>]  : [Get \"https://<vcd>/cloudapi/1.0.0/orgVdcNetworks?page=1&pageSize=32\": dial tcp: lookup <vcd> on 10.96.0.10:53: read udp 10.244.0.13:36578->10.96.0.10:53: i/o timeout]]]", "errorVerbose": "error caching gateway related details: [unable to get OVDC network [<network name>]: [unable to get all ovdc networks: [<nil>]: [Get \"https://<vcd>/cloudapi/1.0.0/orgVdcNetworks?page=1&pageSize=32\": dial tcp: lookup <vcd> on 10.96.0.10:53: read udp 10.244.0.13:36578->10.96.0.10:53: i/o timeout]]]\nfailed to create gateway manager using the workload client to reconcile cluster [<cluster name>]

Interestingly, other API calls are successful, such as token creation and determining which API version to use. These calls fail if the proxy is not set in the environment variables, indicating that some calls are respecting the proxy settings:

auth.go:50] Using VCD OpenAPI version [37.2]
client.go:201] Client is sysadmin: [false]  

Additional Information:

  • If the cluster is deployed in an environment that does not require a proxy, all API calls, including those that fail in the proxy-requiring environment, are successful.

Reproduction steps

  1. Deploy a cluster in an environment that requires a proxy to connect to Cloud Director.
  2. Set the https_proxy and no_proxy environment variables in the cluster-api-provider-cloud-director deployment.
  3. Observe the error during the load balancer creation step, as detailed above.
  4. Deploy the same cluster in an environment that does not require a proxy and observe that all API calls are successful.

Expected behavior

All API calls made by cluster-api-provider-cloud-director should utilize the proxy settings defined in the environment variables.

Additional context

No response

FrancoisKlieberOrange avatar May 21 '24 14:05 FrancoisKlieberOrange

If we search for http.Client{ in the cpi-vcd 1.6.1 (https://github.com/vmware/cloud-provider-for-cloud-director/tree/1.6.z) code base, we will see that proxy setup is inconsistent during http client creation.

vcdsdk/client.go Method : RefreshBearerToken https://github.com/vmware/cloud-provider-for-cloud-director/blob/a0a0e916a5eda50705f9f3e3b7da8471bd6ff763/pkg/vcdsdk/client.go#L113 https://github.com/vmware/cloud-provider-for-cloud-director/blob/a0a0e916a5eda50705f9f3e3b7da8471bd6ff763/pkg/vcdsdk/client.go#L125

vcdsdk/client.go Method : NewVCDClientFromSecrets -> vcdsdk/auth.go Method : GetSwaggerClientFromSecrets https://github.com/vmware/cloud-provider-for-cloud-director/blob/a0a0e916a5eda50705f9f3e3b7da8471bd6ff763/pkg/vcdsdk/auth.go#L99

This will need a fix in cpi-vcd and then that fix needs to be consumed in capvcd

rocknes avatar Aug 01 '24 19:08 rocknes

Hello @arunmk @rocknes , do you have some updates to share on this issue, it is really blocking us. Thanks.

cyrillep avatar Jan 22 '25 15:01 cyrillep