container-service-extension icon indicating copy to clipboard operation
container-service-extension copied to clipboard

External NAT Duplicate IPs

Open wynner opened this issue 3 years ago • 2 comments

Describe the bug

When deploying k8s clusters via cli in close succession the External NAT IP used on the edge gateway (T1) can be assigned to two clusters at the same time. This leads to one or both clusters failing on deployment.

CSE needs to make sure that the DNAT IP claimed for a K8s cluster deployment is applied for the first cluster before an external IP for the second cluster is assigned.

Reproduction steps

Run cli command for the first cluster (vcd cse cluster apply cluster-tkg-1.21.yaml) followed within seconds by the second command (vcd cse cluster apply cluster-tkg-1.22.yaml)

It will not reproduce 100% however it only takes a couple of attempts to demonstrate the issue.

Expected behavior

Each K8s Cluster receives and consumes a unique External IP on the T1 edge gateway for the purposes of exposing the Cluster Master via DNAT.

Additional context

Please contact me on internal VMware email/slack if required to explain further.

wynner avatar May 08 '22 11:05 wynner

Hi @wynner, thanks for bringing this problem to our attention. The current suggested workaround is to allow the first cluster's control plane come up before attempting to deploy the second cluster. We suggest allowing at least 1 minute in between deployments to help avoid this problem.

lzichong avatar May 09 '22 00:05 lzichong

Thanks. I'm doing this however I think you may agree this is something that needs to be looked at. FYI... The version I'm running is CSE 3.1.3 and VCD 10.3.3 with the latest templates for native and TKGm.

wynner avatar May 24 '22 07:05 wynner