cluster-api-provider-gcp Can i configured GCP Instance Network Settings more freely?

trafficstars

/kind feature

Describe the solution you'd like Hi, When creating a cluster, I would like to have more freedom in entering instance information. Because I need to use a shared VPC from another project. And I don't even need a router.

The way I want it is as follows

Using a Shared VPC in another project or
Using predefined instance templates

Is there any way? Thanks.

Mar 16 '23 08:03 oingooing

I've raised this issue on slack a while ago, but didn't catch anyone's attention. I've copy-pasted a copy of it below.

Hey! We've run into a bit of an issue with the cluster-api provider for GCP. Our org is using Shared VPCs to give app teams some degree of sovereignity while simultaneously maintaining control over some central pieces of infrastructure and policies. Unfortunately, there's no support for the Shared VPC use case in CAPG, and to our surprise a search in the Github issues as well as this Slack channel yielded no results. Is this something that has been considered before? If yes, is it on the roadmap? If not, is this something you'd accept a contribution for?

Arguably Shared VPCs come with their own complexities and apparently go beyond the requirements users of cluster-api on GCP have, thus far. We've done some rather dirty hacking on the CAPG controller to make it parse fully qualified subnet self-links on top of bare names, which is in line with the way this is handled on GCP in general.

diff --git a/cloud/scope/machine.go b/cloud/scope/machine.go
index 2cdfa1fa1..6bd5c863e 100644
--- a/cloud/scope/machine.go
+++ b/cloud/scope/machine.go
@@ -270,11 +270,81 @@ func (m *MachineScope) InstanceAdditionalDiskSpec() []*compute.AttachedDisk {
 	return additionalDisks
 }
 
+// subnetHasSlashes returns true, if the subnet *m.GCPMachine.Spec.Subnet is not nil and
+// has a path like this:
+// projects/my-host-project/regions/europe-west3/subnetworks/my-subnetwork
+// false if the subnet was defined without any slashes (ex.: my-subnetwork)
+func (m *MachineScope) subnetHasSlashes() bool {
+	if m.GCPMachine.Spec.Subnet == nil {
+		return false
+	}
+
+	if strings.Contains(*m.GCPMachine.Spec.Subnet, "/") {
+		return true
+	}
+
+	return false
+}
+
+// getProjectFromSubnet returns the project name, if set in Subnet; nil otherwise
+func (m *MachineScope) getProjectFromSubnet() *string {
+	if m.GCPMachine.Spec.Subnet == nil {
+		return nil
+	}
+
+	if m.subnetHasSlashes() {
+		subnetSlice := strings.Split(*m.GCPMachine.Spec.Subnet, "/")
+		if len(subnetSlice) >= 2 && subnetSlice[0] == "projects" && len(subnetSlice[1]) > 0 {
+			return &subnetSlice[1]
+		}
+	}
+
+	return nil
+}
+
+// getNetworkInterfacePath returns the default network path, if the subnet contains the subnet string in the form "my-subnetwork".
+// If the subnet starts with project/my-host-project and the project name is different than my-host-project, the project will be
+// replaced by my-host-project. This is the case, if you need to bind a machine to a Shared VPC in a host project.
+func (m *MachineScope) getNetworkInterfacePath() string {
+	defaultPath := path.Join("projects", m.ClusterGetter.Project(), "global", "networks", m.ClusterGetter.NetworkName())
+
+	subnetProject := m.getProjectFromSubnet()
+	if subnetProject == nil {
+		return defaultPath
+	}
+
+	if m.ClusterGetter.Project() != *subnetProject {
+		defaultPath = path.Join("projects", *subnetProject, "global", "networks", m.ClusterGetter.NetworkName())
+	}
+
+	return defaultPath
+}
+
+// getSubnetworkPath returns the default subnet path if there is no project in the spec subnet path.
+// If the subnet is defined by "projects/my-host-project/regions/europe-west3/subnetworks/my-subnetwork"
+// the complete path will be returned.
+// Warning: There is no path checking. You have to create the correct path yourself!
+func (m *MachineScope) getSubnetworkPath() string {
+	// we dont check m.GCPMachine.Spec.Subnet != nil
+	defaultPath := path.Join("regions", m.ClusterGetter.Region(), "subnetworks", *m.GCPMachine.Spec.Subnet)
+
+	subnetProject := m.getProjectFromSubnet()
+	if subnetProject != nil {
+		// We can dereference the subnet, because getProjectFromSubnet gets the project from the m.GCPMachine.SSpec.Subnet.
+		// Replace the path with the path specified by m.GCPMachine.Subnet
+		defaultPath = *m.GCPMachine.Spec.Subnet
+	}
+
+	return defaultPath
+}
+
 // InstanceNetworkInterfaceSpec returns compute network interface spec.
 func (m *MachineScope) InstanceNetworkInterfaceSpec() *compute.NetworkInterface {
 	networkInterface := &compute.NetworkInterface{
-		Network: path.Join("projects", m.ClusterGetter.Project(), "global", "networks", m.ClusterGetter.NetworkName()),
+		Network: m.getNetworkInterfacePath(),
 	}
+	// TODO: replace with Debug logger (if available) or remove
+	fmt.Printf("#### InstanceNetworkInterfaceSpec networkInterface: %+v\n", m.getNetworkInterfacePath())
 
 	if m.GCPMachine.Spec.PublicIP != nil && *m.GCPMachine.Spec.PublicIP {
 		networkInterface.AccessConfigs = []*compute.AccessConfig{
@@ -286,7 +356,9 @@ func (m *MachineScope) InstanceNetworkInterfaceSpec() *compute.NetworkInterface
 	}
 
 	if m.GCPMachine.Spec.Subnet != nil {
-		networkInterface.Subnetwork = path.Join("regions", m.ClusterGetter.Region(), "subnetworks", *m.GCPMachine.Spec.Subnet)
+		networkInterface.Subnetwork = m.getSubnetworkPath()
+		// TODO: replace with Debug logger (if available) or remove
+		fmt.Printf("#### InstanceNetworkInterfaceSpec subnet is set: %+v\n", networkInterface.Subnetwork)
 	}
 
 	return networkInterface

This is limited to the Machine scope, which was enough for us to be able to create a cluster with GCE instances being attached to one project, while their network interfaces are attached to a host VPC in another project. We've worked around the rest by creating the firewall rules the CAPG controller creates in the wrong project and network¹ ourselves using Terraform. Additionally, we had to pass a config file for the target cluster's controller-manager in order to make it provision loadbalancer components in the correct project, which finally yielded a cluster working the way you'd (or at least we'd) expect it to work in a Shared VPC.

We'd like to flesh this out and get it upstreamed so we don't have the burden of maintaining a fork and can drop the various workarounds, but before putting any work towards it we decided it'd be best to hop in here and ask for input/opinions/guidance/things we might have missed, and some general advice on how to go about implementing this with minimal friction and without introducing breaking changes. Please tag me in replies, as I don't get to check Slack very often. Thank you :)

¹: It creates a new VPC with the same name as the actual target VPC, residing in the project it creates the machines in, then creates firewall rules for that VPC to allow intra-cluster traffic, which don't come into effect, because its nodes are not running in that network.

Mar 19 '23 20:03 itspngu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 17 '23 21:06 k8s-triage-robot

/remove-lifecycle stale

Jun 19 '23 12:06 itspngu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 22 '24 22:01 k8s-triage-robot

/remove-lifecycle stale

Jan 23 '24 09:01 itspngu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 22 '24 09:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

May 22 '24 09:05 k8s-triage-robot

/remove-lifecycle stale

May 22 '24 09:05 itspngu

cluster-api-provider-gcp cluster-api-provider-gcp copied to clipboard

Can i configured GCP Instance Network Settings more freely?

cluster-api-provider-gcp
cluster-api-provider-gcp copied to clipboard