container-service-extension
container-service-extension copied to clipboard
External NAT Duplicate IPs
Describe the bug
When deploying k8s clusters via cli in close succession the External NAT IP used on the edge gateway (T1) can be assigned to two clusters at the same time. This leads to one or both clusters failing on deployment.
CSE needs to make sure that the DNAT IP claimed for a K8s cluster deployment is applied for the first cluster before an external IP for the second cluster is assigned.
Reproduction steps
Run cli command for the first cluster (vcd cse cluster apply cluster-tkg-1.21.yaml) followed within seconds by the second command (vcd cse cluster apply cluster-tkg-1.22.yaml)
It will not reproduce 100% however it only takes a couple of attempts to demonstrate the issue.
Expected behavior
Each K8s Cluster receives and consumes a unique External IP on the T1 edge gateway for the purposes of exposing the Cluster Master via DNAT.
Additional context
Please contact me on internal VMware email/slack if required to explain further.
Hi @wynner, thanks for bringing this problem to our attention. The current suggested workaround is to allow the first cluster's control plane come up before attempting to deploy the second cluster. We suggest allowing at least 1 minute in between deployments to help avoid this problem.
Thanks. I'm doing this however I think you may agree this is something that needs to be looked at. FYI... The version I'm running is CSE 3.1.3 and VCD 10.3.3 with the latest templates for native and TKGm.