Some API call are not using https_proxy/no_proxy set at controller pod level.
Describe the bug
It appears that not all calls made by cluster-api-provider-cloud-director are utilizing the proxy settings defined in the environment variables.
Environment Details:
- We have a cluster deployed in an environment that requires a proxy to connect to Cloud Director.
- All Cluster-API providers are deployed within this environment.
- The deployment of
cluster-api-provider-cloud-directorhas been updated withhttps_proxyandno_proxyenvironment variables.
Here is the deployment configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
cluster.x-k8s.io/provider: infrastructure-vcd
clusterctl.cluster.x-k8s.io: ""
control-plane: controller-manager
name: capvcd-controller-manager
namespace: capvcd-system
spec:
replicas: 1
selector:
matchLabels:
cluster.x-k8s.io/provider: infrastructure-vcd
control-plane: controller-manager
template:
metadata:
labels:
cluster.x-k8s.io/provider: infrastructure-vcd
control-plane: controller-manager
spec:
containers:
- command:
- /opt/vcloud/bin/cluster-api-provider-cloud-director
env:
- name: https_proxy
value: <proxy settings>
- name: no_proxy
value: localhost,.svc,.cluster.local,.svc.cluster.local,<pod cidr>,<service cidr>
image: projects.registry.vmware.com/vmware-cloud-director/cluster-api-provider-cloud-director:v1.3.0
imagePullPolicy: IfNotPresent
Issue Observed:
During the load balancer creation step, the request is being resolved by CoreDNS (10.96.0.10:53) instead of using the proxy. Below is the error message encountered:
Reconciler error
{"controller": "vcdcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "error": "failed to create gateway manager using the workload client to reconcile cluster [<cluster name>]: error caching gateway related details: [unable to get OVDC network [<network name>]: [unable to get all ovdc networks: [<nil>] : [Get \"https://<vcd>/cloudapi/1.0.0/orgVdcNetworks?page=1&pageSize=32\": dial tcp: lookup <vcd> on 10.96.0.10:53: read udp 10.244.0.13:36578->10.96.0.10:53: i/o timeout]]]", "errorVerbose": "error caching gateway related details: [unable to get OVDC network [<network name>]: [unable to get all ovdc networks: [<nil>]: [Get \"https://<vcd>/cloudapi/1.0.0/orgVdcNetworks?page=1&pageSize=32\": dial tcp: lookup <vcd> on 10.96.0.10:53: read udp 10.244.0.13:36578->10.96.0.10:53: i/o timeout]]]\nfailed to create gateway manager using the workload client to reconcile cluster [<cluster name>]
Interestingly, other API calls are successful, such as token creation and determining which API version to use. These calls fail if the proxy is not set in the environment variables, indicating that some calls are respecting the proxy settings:
auth.go:50] Using VCD OpenAPI version [37.2]
client.go:201] Client is sysadmin: [false]
Additional Information:
- If the cluster is deployed in an environment that does not require a proxy, all API calls, including those that fail in the proxy-requiring environment, are successful.
Reproduction steps
- Deploy a cluster in an environment that requires a proxy to connect to Cloud Director.
- Set the
https_proxyandno_proxyenvironment variables in thecluster-api-provider-cloud-directordeployment. - Observe the error during the load balancer creation step, as detailed above.
- Deploy the same cluster in an environment that does not require a proxy and observe that all API calls are successful.
Expected behavior
All API calls made by cluster-api-provider-cloud-director should utilize the proxy settings defined in the environment variables.
Additional context
No response
If we search for http.Client{ in the cpi-vcd 1.6.1 (https://github.com/vmware/cloud-provider-for-cloud-director/tree/1.6.z) code base, we will see that proxy setup is inconsistent during http client creation.
vcdsdk/client.go Method : RefreshBearerToken https://github.com/vmware/cloud-provider-for-cloud-director/blob/a0a0e916a5eda50705f9f3e3b7da8471bd6ff763/pkg/vcdsdk/client.go#L113 https://github.com/vmware/cloud-provider-for-cloud-director/blob/a0a0e916a5eda50705f9f3e3b7da8471bd6ff763/pkg/vcdsdk/client.go#L125
vcdsdk/client.go Method : NewVCDClientFromSecrets -> vcdsdk/auth.go Method : GetSwaggerClientFromSecrets https://github.com/vmware/cloud-provider-for-cloud-director/blob/a0a0e916a5eda50705f9f3e3b7da8471bd6ff763/pkg/vcdsdk/auth.go#L99
This will need a fix in cpi-vcd and then that fix needs to be consumed in capvcd
Hello @arunmk @rocknes , do you have some updates to share on this issue, it is really blocking us. Thanks.