fully-homomorphic-encryption icon indicating copy to clipboard operation
fully-homomorphic-encryption copied to clipboard

[Feature Request] Support for secure environments with no external IPs

Open christianepeters opened this issue 3 months ago • 0 comments

The ./cloud/tool script fails in projects where the constraints/compute.vmExternalIpAccess org policies is enforced, as the script assumes public IPs for TPU VMs.

Error Log

google.api_core.exceptions.InvalidArgument: 400 Cloud TPU received a bad request. Constraint constraints/compute.vmExternalIpAccess violated. Please create TPUs without external IP addresses or remove the Organization Policy Constraint.

Proposed Solution

Disabling the org policy is an option, but not recommended, and in some environments won't be possible given org admin settings.

A comprehensive solution involves two features:

  1. provision the VM without a public IP, and
  2. provide a secure way to access it.

Note: If the VM also needs outbound access (e.g., to download packages), a Cloud NAT gateway would be required, but that is out of scope for this proposed change. More info at https://cloud.google.com/solutions/connecting-securely

We recommend to make this solution optional to the original script to allow the default quick setup as implemented as of now.


Part 1: Provisioning without an External IP

To comply with the vmExternalIpAccess policy, the script should offer an optional flag to create a TPU VM without a public IP and set up the necessary firewall rule for IAP access.

1. Add the optional --no-external-ip flag:

This flag should be off by default to maintain existing behavior.

# In class Tool, method provision(self):
         ...
         parser.add_argument(
            "--no-external-ip",
            action="store_true",
            default=False, # Off by default
            help="Deploy the TPU VM without an external IP and provision an IAP firewall rule.",
         )

2. Update cloud/provision.py to adapt its behavior:

The provision() function should be modified to:

  • Conditionally disable the external IP in the NetworkConfig.
    # In cloud/provision.py
    network_config = tpu_v2.NetworkConfig(
        network=network_url,
        subnetwork=subnet_url,
        enable_external_ip=not no_external_ip, # Logic based on the new flag
    )
    

Part 2: Accessing the VM via IAP Tunnel

A VM without a public IP requires a secure access method using IAP for TCP forwarding.

1. Add an optional --iap argument to the run command:

# In class Tool, method run(self):
         ...
         parser.add_argument(
            "--iap",
            action="store_true",
            default=False,
            help="Use IAP TCP forwarding for SSH and SCP.",
         )

2. Conditionally switch to gcloud alpha and add the tunnel flag:

When --iap is present, the script must use the alpha release channel of gcloud and append the --tunnel-through-iap flag to scp and ssh commands.

# In class Tool, method run(self), after parsing args:
        if args.iap:
            global tpu_vm_cmd_pref
            tpu_vm_cmd_pref = ["gcloud", "alpha", "compute", "tpus", "tpu-vm"]

# In vm_scp(...):
    cmd = [ *tpu_vm_cmd_pref, "scp", ... ]
    if "alpha" in tpu_vm_cmd_pref:
        cmd.append("--tunnel-through-iap")
    subprocess.run(cmd, check=True)

# In vm_run_ssh(...):
    gcloud_cmd = [ *tpu_vm_cmd_pref, "ssh", ... ]
    if "alpha" in tpu_vm_cmd_pref:
        gcloud_cmd.append("--tunnel-through-iap")
    res = subprocess.run(gcloud_cmd, stderr=subprocess.PIPE)
  1. Automatically provision the IAP Firewall Rule:

To enable IAP, a specific firewall rule is required. The cloud/provision.py script should be updated to create this rule programmatically when the --no-external-ip flag is used. Details here https://cloud.google.com/iap/docs/using-tcp-forwarding

Test

Toggle the org policy and then deploy with the new flag.

christianepeters avatar Sep 25 '25 15:09 christianepeters