html icon indicating copy to clipboard operation
html copied to clipboard

Suggestion: Allow adding `maxcount` to `<input type="email/file" multiple>`

Open ThomasLandauer opened this issue 1 month ago • 1 comments

What problem are you trying to solve?

For <input>s with multiple, it would be nice to apply a limit on the number of items.

What solutions exist today?

The maxlength attribute, but this only restricts the total number of characters. However, for email addresses, the more important aspect is how many.

How would you solve it?

<input maxcount="5">. The wording aligns nicely with existing maxlength.

Anything else?

There is a related issue for <input type="file">, but it focuses more on the number of bytes: https://github.com/whatwg/html/issues/4923

ThomasLandauer avatar Dec 05 '25 09:12 ThomasLandauer

A cluster that has been up and working properly is running GKE version 1.32.8-gke.1108000, and the new, broken cluster is GKE version 1.33.4-gke.1134000.

chipkent avatar Oct 08 '25 16:10 chipkent

I created a new 1.32.8-gke.1108000 cluster, and the segfault is still present. The problem looks more complex than a simple version change.

In this case, 1 of 3 pods came up ok.

chipkent avatar Oct 08 '25 17:10 chipkent

For 1.32.8-gke.1108000, repeated pod restarts by hand resulted in emissary-ingress coming up ok. On 1.33.4-gke.1134000, I was never able to get a pod to come up.

chipkent avatar Oct 08 '25 20:10 chipkent

GCP may be messing with things again. New problems today:

emissary-apiext now has errors that look like:

"time="2025-10-14 15:50:39.5816" level=error msg="shut down with error error: secrets \"emissary-ingress-webhook-ca\" is forbidden: User \"system:serviceaccount:emissary-system:emissary-apiext\" cannot get resource \"secrets\" in API group \"\" in the namespace \"emissary-system\": RBAC: role.rbac.authorization.k8s.io \"emissary-apiext\" not found" func=github.com/emissary-ingress/emissary/v3/pkg/busy.Main file="/go/pkg/busy/busy.go:87" CMD=apiext PID=1"

emissary-ingress seemed to come up.

chipkent avatar Oct 14 '25 15:10 chipkent

The prior report was an intermittent problem. emissary-apiext came up on the second try, and emissary-ingress is back to segfaulting.

chipkent avatar Oct 14 '25 16:10 chipkent

Hi @chipkent, looking into this a bit -- I assume you're using Helm chart version 8.12.2 with app version 3.12.2? Candidly, I don't know what's in that version: it was mistakenly generated by a broken Ambassador Labs CI script.

I'd strongly recommend that you drop back to 3.10.0 as described in the QuickStart. I'll also update the README to mention that 3.12.2 is not something that folks should use.

kflynn avatar Oct 14 '25 17:10 kflynn

@kflynn Thanks for your help on this. I went through and converted my TerraForm to follow the quickstart you sent. There are still segfaults. Make sure this configuration looks correct.

inputs.tf:


variable "project_id" {
  description = "Project id"
}

variable "cluster_name" {
  description = "Cluster name"  
}

variable "namespace" {
  description = "Namespace to install emissary in"
  default = "emissary"
}

variable "chart_version" {
  description = "Emissary chart version (applies to both CRD and main chart). See https://emissary-ingress.dev/docs/3.10/quick-start/"
  default     = "3.10.0"
}

variable "dns_managed_zone" {
  description = "DNS managed zone for cluster ingress"
}

main.tf:


resource "kubernetes_namespace" "emissary" {
  metadata {
    name = var.namespace
  }
}

resource "helm_release" "emissary_crds" {
  name      = "emissary-crds"
  namespace = kubernetes_namespace.emissary.id
  chart     = "oci://ghcr.io/emissary-ingress/emissary-crds-chart"
  version   = var.chart_version

  // For fresh installs: skip legacy CRD versions and conversion webhook
  set {
    name  = "enableLegacyVersions"
    value = "false"
  }

  // Wait for CRDs to be ready before proceeding
  wait = true
}

resource "helm_release" "emissary" {
  name = "emissary-ingress"
  namespace = kubernetes_namespace.emissary.id
  chart     = "oci://ghcr.io/emissary-ingress/emissary-ingress"
  version   = var.chart_version

  // Wait for conversion webhook since we enabled legacy versions in CRD chart
  set {
    name  = "waitForApiext.enabled"
    value = "false"
  }

  // Wait for Emissary to be ready
  wait = true

  // Ensure CRDs are installed before main chart
  depends_on = [helm_release.emissary_crds]
}

// data on the emissary ingress service launched by helm
data "kubernetes_service" "emissary" {
  metadata {
    name = helm_release.emissary.name
    namespace = kubernetes_namespace.emissary.id
  }

  depends_on = [helm_release.emissary]
}

data "google_dns_managed_zone" "default" {
  name = var.dns_managed_zone
  project = var.project_id
}

locals {
  _dns_root_raw = "${var.cluster_name}.${data.google_dns_managed_zone.default.dns_name}"
  dns_root = substr(local._dns_root_raw,0,length(local._dns_root_raw)-1)
}

resource "google_dns_record_set" "cluster" {
  project = data.google_dns_managed_zone.default.project
  name = "*.${local.dns_root}."
  type = "A"
  ttl  = 300

  managed_zone = data.google_dns_managed_zone.default.name

  rrdatas = [data.kubernetes_service.emissary.status.0.load_balancer.0.ingress.0.ip]
}

Logs:

2025-10-17 14:08:12.895 PDT
v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
2025-10-17 14:08:12.903 PDT
v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
2025-10-17 14:08:13.272 PDT
2025-10-17 21:08:13 diagd 3.10.0 [P16TMainThread] INFO: AMBASSADOR_FAST_RECONFIGURE enabled, initializing cache
2025-10-17 14:08:13.272 PDT
2025-10-17 21:08:13 diagd 3.10.0 [P16TMainThread] INFO: WILL NOT update Mapping status
2025-10-17 14:08:13.274 PDT
2025-10-17 21:08:13 diagd 3.10.0 [P16TMainThread] INFO: thread count 9, listening on 127.0.0.1:8004
2025-10-17 14:08:13.676 PDT
2025-10-17 21:08:13 diagd 3.10.0 [P16TMainThread] INFO: Ambassador 3.10.0 booted
2025-10-17 14:08:13.681 PDT
Waited for 1.178873715s due to client-side throttling, not priority and fairness, request: GET:https://34.118.224.1:443/apis/getambassador.io/v3alpha1/mappings?limit=500&resourceVersion=0
2025-10-17 14:08:13.687 PDT
[2025-10-17 21:08:13 +0000] [16] [INFO] Starting gunicorn 23.0.0
2025-10-17 14:08:13.687 PDT
[2025-10-17 21:08:13 +0000] [16] [INFO] Listening at: http://127.0.0.1:8004 (16)
2025-10-17 14:08:13.687 PDT
[2025-10-17 21:08:13 +0000] [16] [INFO] Using worker: gthread
2025-10-17 14:08:13.691 PDT
[2025-10-17 21:08:13 +0000] [18] [INFO] Booting worker with pid: 18
2025-10-17 14:08:13.694 PDT
2025-10-17 21:08:13 diagd 3.10.0 [P18TAEW] INFO: starting Scout checker and timer logger
2025-10-17 14:08:13.695 PDT
2025-10-17 21:08:13 diagd 3.10.0 [P18TAEW] INFO: starting event watcher
2025-10-17 14:08:13.894 PDT
"Unhandled Error" err="pkg/kates/client.go:469: Failed to watch *unstructured.Unstructured: can't watch endpointslices.v1.discovery.k8s.io: forbidden" logger="UnhandledError"
2025-10-17 14:08:14.534 PDT
2025-10-17 21:08:14 diagd 3.10.0 [P18TAEW] ERROR: Secret fallback-self-signed-cert.emissary unknown
2025-10-17 14:08:14.536 PDT
2025-10-17 21:08:14 diagd 3.10.0 [P18TAEW] INFO: EnvoyConfig: Generating V3
2025-10-17 14:08:14.537 PDT
2025-10-17 21:08:14 diagd 3.10.0 [P18TAEW] INFO: V3Ready: ==== listen on 127.0.0.1:8006
2025-10-17 14:08:14.537 PDT
2025-10-17 21:08:14 diagd 3.10.0 [P18TAEW] WARNING: No active listeners at all; check your Listener and Host configuration
2025-10-17 14:08:14.539 PDT
2025-10-17 21:08:14 diagd 3.10.0 [P18TAEW] INFO: configuration updated (incremental) from snapshot snapshot (S13 L0 G2 C1)
2025-10-17 14:08:14.539 PDT
time="2025-10-17 21:08:14.5395" level=info msg="started command [\"envoy\" \"-c\" \"/ambassador/bootstrap-ads.json\" \"--base-id\" \"0\" \"--drain-time-s\" \"600\" \"-l\" \"error\"]" func="github.com/datawire/dlib/dexec.(*Cmd).Start" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:183" CMD=entrypoint PID=1 THREAD=/envoy dexec.pid=24
2025-10-17 14:08:14.541 PDT
time="2025-10-17 21:08:14.5409" level=info msg="not logging input read from file \"/dev/stdin\"" func="github.com/datawire/dlib/dexec.(*Cmd).Start" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:185" CMD=entrypoint PID=1 THREAD=/envoy dexec.pid=24 dexec.stream=stdin
2025-10-17 14:08:14.541 PDT
time="2025-10-17 21:08:14.5410" level=info msg="not logging output written to file \"/dev/stdout\"" func="github.com/datawire/dlib/dexec.(*Cmd).Start" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:188" CMD=entrypoint PID=1 THREAD=/envoy dexec.pid=24 dexec.stream=stdout
2025-10-17 14:08:14.541 PDT
time="2025-10-17 21:08:14.5411" level=info msg="not logging output written to file \"/dev/stderr\"" func="github.com/datawire/dlib/dexec.(*Cmd).Start" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:191" CMD=entrypoint PID=1 THREAD=/envoy dexec.pid=24 dexec.stream=stderr
2025-10-17 14:08:14.550 PDT
time="2025-10-17 21:08:14.5498" level=info msg="Loaded file /ambassador/envoy/envoy.json" func=github.com/emissary-ingress/emissary/v3/pkg/ambex.Decode file="/go/pkg/ambex/main.go:281" CMD=entrypoint PID=1 THREAD=/ambex/main-loop
2025-10-17 14:08:14.580 PDT
time="2025-10-17 21:08:14.5800" level=info msg="Saved snapshot v1" func=github.com/emissary-ingress/emissary/v3/pkg/ambex.csDump file="/go/pkg/ambex/main.go:351" CMD=entrypoint PID=1 THREAD=/ambex/main-loop
2025-10-17 14:08:14.595 PDT
time="2025-10-17 21:08:14.5953" level=info msg="Pushing snapshot v1" func=github.com/emissary-ingress/emissary/v3/pkg/ambex.updaterWithTicker file="/go/pkg/ambex/ratelimit.go:159" CMD=entrypoint PID=1 THREAD=/ambex/updater
2025-10-17 14:08:14.639 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:127] Caught Segmentation fault, suspect faulting address 0x1acd9843e7e8
2025-10-17 14:08:14.639 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:111] Backtrace (use tools/stack_decode.py to get line numbers):
2025-10-17 14:08:14.639 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:112] Envoy version: 628f5afc75a894a08504fa0f416269ec50c07bf9/1.31.4-dev/Clean/RELEASE/BoringSSL
2025-10-17 14:08:14.639 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:114] Address mapping: 58c614d06000-58c6187ae000 /usr/local/bin/envoy
2025-10-17 14:08:14.639 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:121] #0: [0x7ce6aac20320]
2025-10-17 14:08:14.639 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:121] #1: [0x58c6155ffe36]
2025-10-17 14:08:14.639 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:121] #2: [0x58c616563818]
2025-10-17 14:08:14.639 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:121] #3: [0x58c61656031f]
2025-10-17 14:08:14.640 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:121] #4: [0x58c61650e968]
2025-10-17 14:08:14.640 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:121] #5: [0x58c61650fe51]
2025-10-17 14:08:14.640 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:121] #6: [0x58c61650d74c]
2025-10-17 14:08:14.640 PDT
[2025-10-17 21:08:14.639][24][critical][backtrace] [./source/server/backtrace.h:121] #7: [0x58c61650e06e]
2025-10-17 14:08:14.640 PDT
[2025-10-17 21:08:14.640][24][critical][backtrace] [./source/server/backtrace.h:121] #8: [0x58c61650e1fc]
2025-10-17 14:08:14.640 PDT
[2025-10-17 21:08:14.640][24][critical][backtrace] [./source/server/backtrace.h:121] #9: [0x58c614d0614c]
2025-10-17 14:08:14.640 PDT
[2025-10-17 21:08:14.640][24][critical][backtrace] [./source/server/backtrace.h:121] #10: [0x7ce6aac0b510]
2025-10-17 14:08:14.643 PDT
time="2025-10-17 21:08:14.6433" level=info msg="finished with error: signal: segmentation fault" func="github.com/datawire/dlib/dexec.(*Cmd).Wait" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:257" CMD=entrypoint PID=1 THREAD=/envoy dexec.pid=24
2025-10-17 14:08:14.643 PDT
time="2025-10-17 21:08:14.6434" level=error msg="goroutine \"/envoy\" exited with error: signal: segmentation fault" func="github.com/datawire/dlib/dgroup.(*Group).goWorkerCtx.func1.1" file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:380" CMD=entrypoint PID=1 THREAD=/envoy
2025-10-17 14:08:14.643 PDT
time="2025-10-17 21:08:14.6435" level=info msg="shutting down (gracefully)..." func="github.com/datawire/dlib/dgroup.(*Group).launchSupervisors.func1" file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:238" CMD=entrypoint PID=1 THREAD="/ambex:shutdown_logger"
2025-10-17 14:08:14.643 PDT
time="2025-10-17 21:08:14.6437" level=info msg="shutting down (gracefully)..." func="github.com/datawire/dlib/dgroup.(*Group).launchSupervisors.func1" file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:238" CMD=entrypoint PID=1 THREAD="/watcher:shutdown_logger"
2025-10-17 14:08:14.644 PDT
time="2025-10-17 21:08:14.6442" level=info msg="sending SIGINT" func="github.com/datawire/dlib/dexec.(*Cmd).Start.func1" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:211" CMD=entrypoint PID=1 THREAD=/diagd
2025-10-17 14:08:14.644 PDT
time="2025-10-17 21:08:14.6442" level=info msg="shutting down (gracefully)..." func="github.com/datawire/dlib/dgroup.(*Group).launchSupervisors.func1" file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:238" CMD=entrypoint PID=1 THREAD=":shutdown_logger"
2025-10-17 14:08:14.645 PDT
[2025-10-17 21:08:14 +0000] [16] [INFO] Handling signal: int
2025-10-17 14:08:14.649 PDT
time="2025-10-17 21:08:14.6495" level=info msg="Memory Usage 0.00Gi\n PID 1, 0.08Gi: busyambassador entrypoint \n PID 16, 0.04Gi: /usr/bin/python /usr/bin/diagd /ambassador/snapshots /ambassador/bootstrap-ads.json /ambassador/envoy/envoy.json --notices /ambassador/notices.json --port 8004 --kick kill -HUP 1 \n PID 18, 0.04Gi: /usr/bin/python /usr/bin/diagd /ambassador/snapshots /ambassador/bootstrap-ads.json /ambassador/envoy/envoy.json --notices /ambassador/notices.json --port 8004 --kick kill -HUP 1 " func="github.com/emissary-ingress/emissary/v3/pkg/memory.(*MemoryUsage).Watch" file="/go/pkg/memory/memory.go:43" CMD=entrypoint PID=1 THREAD=/memory
2025-10-17 14:08:14.746 PDT
[2025-10-17 21:08:14 +0000] [18] [INFO] Worker exiting (pid: 18)
2025-10-17 14:08:14.946 PDT
[2025-10-17 21:08:14 +0000] [16] [INFO] Shutting down: Master
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0461" level=info msg="finished successfully: exit status 0" func="github.com/datawire/dlib/dexec.(*Cmd).Wait" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:255" CMD=entrypoint PID=1 THREAD=/diagd dexec.pid=16
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0465" level=info msg=" final goroutine statuses:" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:84" CMD=entrypoint PID=1 THREAD=":shutdown_status"
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0466" level=info msg=" /ambex : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0466" level=info msg=" /diagd : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0466" level=info msg=" /envoy : exited with error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0467" level=info msg=" /external_snapshot_server: exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0467" level=info msg=" /healthchecks : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0467" level=info msg=" /memory : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0467" level=info msg=" /snapshot_server : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0468" level=info msg=" /watcher : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status"
2025-10-17 14:08:15.046 PDT
time="2025-10-17 21:08:15.0468" level=error msg="shut down with error error: signal: segmentation fault" func=github.com/emissary-ingress/emissary/v3/pkg/busy.Main file="/go/pkg/busy/busy.go:87" CMD=entrypoint PID=1

chipkent avatar Oct 17 '25 21:10 chipkent

I've stripped some more out of the TerraForm to minimize scope.

resource "kubernetes_namespace" "emissary" {
  metadata {
    name = var.namespace
  }
}

// Step 1: Install Emissary CRDs via Helm chart
// This installs only v3alpha1 CRDs and skips the conversion webhook for fresh installs
resource "helm_release" "emissary_crds" {
  name      = "emissary-crds"
  namespace = kubernetes_namespace.emissary.id
  chart     = "oci://ghcr.io/emissary-ingress/emissary-crds-chart"
  version   = var.chart_version

  // For fresh installs: skip legacy CRD versions and conversion webhook
  set {
    name  = "enableLegacyVersions"
    value = "false"
  }

  // Wait for CRDs to be ready before proceeding
  wait = true
}

// Step 2: Install Emissary main chart
// Depends on CRDs being installed first
resource "helm_release" "emissary" {
  name = "emissary-ingress"
  namespace = kubernetes_namespace.emissary.id
  chart     = "oci://ghcr.io/emissary-ingress/emissary-ingress"
  version   = var.chart_version

  // Wait for conversion webhook since we enabled legacy versions in CRD chart
  set {
    name  = "waitForApiext.enabled"
    value = "false"
  }

  // Wait for Emissary to be ready
  wait = true

  // Ensure CRDs are installed before main chart
  depends_on = [helm_release.emissary_crds]
}

chipkent avatar Oct 17 '25 21:10 chipkent

I've created a concise terraform reproducer:

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.0"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.0"
    }
    null = {
      source  = "hashicorp/null"
      version = "~> 3.0"
    }
  }
}

# Variables
variable "project_id" {
  description = "GCP Project ID"
  type        = string
}

variable "region" {
  description = "GCP Region"
  type        = string
  default     = "us-central1"
}

variable "cluster_name" {
  description = "GKE Cluster Name"
  type        = string
  default     = "emissary-test"
}

variable "emissary_version" {
  description = "Emissary version"
  type        = string
  default     = "3.10.0"
}

# Providers
provider "google" {
  project = var.project_id
  region  = var.region
}

# Default VPC
data "google_compute_network" "default" {
  name = "default"
}

# Subnet for GKE cluster
resource "google_compute_subnetwork" "cluster" {
  name          = "${var.cluster_name}-subnet"
  region        = var.region
  network       = data.google_compute_network.default.id
  ip_cidr_range = "10.0.0.0/24"
}

# GKE Autopilot Cluster (exact same recipe as main project)
resource "google_container_cluster" "cluster" {
  name     = var.cluster_name
  location = var.region
  deletion_protection = false

  resource_labels = {
    cluster-role = "test"
    cluster-name = var.cluster_name
  }

  network    = data.google_compute_network.default.id
  subnetwork = google_compute_subnetwork.cluster.name

  ip_allocation_policy {}

  lifecycle {
    ignore_changes = [dns_config, gateway_api_config]
  }

  enable_autopilot = true

  release_channel {
    channel = "STABLE"
  }
}

# Providers configured after cluster creation
data "google_client_config" "provider" {}

provider "kubernetes" {
  host  = "https://${google_container_cluster.cluster.endpoint}"
  token = data.google_client_config.provider.access_token
  cluster_ca_certificate = base64decode(
    google_container_cluster.cluster.master_auth[0].cluster_ca_certificate,
  )
}

provider "helm" {
  kubernetes {
    host                   = "https://${google_container_cluster.cluster.endpoint}"
    token                  = data.google_client_config.provider.access_token
    cluster_ca_certificate = base64decode(
      google_container_cluster.cluster.master_auth[0].cluster_ca_certificate,
    )
  }
}

# Get cluster credentials
resource "null_resource" "get_credentials" {
  provisioner "local-exec" {
    command = "gcloud container clusters get-credentials ${var.cluster_name} --region ${var.region} --project ${var.project_id}"
  }
  depends_on = [google_container_cluster.cluster]
}

# Emissary namespace
resource "kubernetes_namespace" "emissary" {
  metadata {
    name = "emissary"
  }
  depends_on = [null_resource.get_credentials]
}

# Step 1: Install Emissary CRDs (exact same as main project)
resource "helm_release" "emissary_crds" {
  name      = "emissary-crds"
  namespace = kubernetes_namespace.emissary.metadata[0].name
  chart     = "oci://ghcr.io/emissary-ingress/emissary-crds-chart"
  version   = var.emissary_version

  set {
    name  = "enableLegacyVersions"
    value = "false"
  }

  wait = true
}

# Step 2: Install Emissary (exact same as main project)
resource "helm_release" "emissary" {
  name      = "emissary-ingress"
  namespace = kubernetes_namespace.emissary.metadata[0].name
  chart     = "oci://ghcr.io/emissary-ingress/emissary-ingress"
  version   = var.emissary_version

  set {
    name  = "waitForApiext.enabled"
    value = "false"
  }

  wait = true

  depends_on = [helm_release.emissary_crds]
}

# Outputs
output "cluster_name" {
  value = google_container_cluster.cluster.name
}

output "cluster_endpoint" {
  value = google_container_cluster.cluster.endpoint
}

output "emissary_namespace" {
  value = kubernetes_namespace.emissary.metadata[0].name
}

chipkent avatar Oct 17 '25 22:10 chipkent

Update: I changed from autopilot to a normal cluster, and everything works. Here I commented out the autopilot and gave it an initial node count.

# GKE Autopilot Cluster (exact same recipe as main project)
resource "google_container_cluster" "cluster" {
  name     = var.cluster_name
  location = var.region
  deletion_protection = false

  resource_labels = {
    cluster-role = "test"
    cluster-name = var.cluster_name
  }

  network    = data.google_compute_network.default.id
  subnetwork = google_compute_subnetwork.cluster.name

  ip_allocation_policy {}

  lifecycle {
    ignore_changes = [dns_config, gateway_api_config]
  }

  # enable_autopilot = true
  initial_node_count = 5

  release_channel {
    channel = "STABLE"
  }
}

chipkent avatar Oct 20 '25 19:10 chipkent

I've done quite a bit of back and forth with GCP. This appears to be a problem on their side. See below for details and how to work around the problem:

Hello Chip,


Thank you again for your patience and for the steps you provided.


Seeing the replication of the github issue and internal documentation the persistent envoy segmentation fault is localized to the specific machine type GKE Autopilot provisions by default.


I checked the logs in your project and this is the process: 


The crash occurs when your pod is scheduled onto the EK machine series 
The EK machine's underlying Linux kernel and OS tuning, when combined with the mandatory SeccompProfile: RuntimeDefault policy enforced by Autopilot, triggers a fatal system call failure in the C++ Envoy binary during startup.
Our final test successfully stabilized the workload by forcing it onto the N2 machine family, achieving 1/1 Running status with zero restarts.

I understand your desire to maintain the lowest operational custom a workaround is to instruct autopilot to bypass the EK machine family and use a stable alternative, since your environment is managed by Terraform/Helm, the fix is applied directly in your Pod specification via the node selector mechanism [1]


We recommend targeting the N2 machine family as it proved to be stable and offers excellent price-performance. [2]


# This configuration might be added to your emissary values: (example)


deployment:
  resources:
     nodeSelector:
    # Explicitly target the stable N2 machine family
    cloud.google.com/machine-family: n2


I understand your desire to maintain the lowest operational cost, as autopilot was preferred for its optimized billing. Unfortunately, because the EK machine is the source of the crash, we must bypass it, while we have this immediate fix, the underlying issue is a  regression. I can confirm that the product and engineering teams are aware of this situation, as other internal reports point to the EK machine conflict, for now, there is no public ETA for a platform patch, so using this N2 machine(or another one that suits your needs) family selector is the necessary solution, this issue is being tracked as a platform regression impacting customers who rely on specific behavior. The node selector fix is the official workaround until a platform patch is released for the EK machine.


Also seeing your last response you can keep down your cluster, by now these are my findings of this investigation, let me know if you need any assistance or any other questions


Regards,
Luis A.
Google Cloud Support
Working Hours: 11:00 - 18:00 CST (UTC-6)

[1] https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
[2] https://docs.cloud.google.com/compute/docs/machine-resource

chipkent avatar Nov 04 '25 16:11 chipkent

Here is terraform that will get around the problem:

resource "helm_release" "emissary" {
  name      = "emissary-ingress"
  namespace = kubernetes_namespace.emissary.metadata[0].name
  chart     = "oci://ghcr.io/emissary-ingress/emissary-ingress"
  version   = var.emissary_version

  set {
    name  = "waitForApiext.enabled"
    value = "false"
  }

  # Optional nodeSelector to target N2 machine family (workaround for EK machine segfaults)
  set {
      name  = "nodeSelector.cloud\\.google\\.com/machine-family"
      value = "n2"
  }

  wait = true

  depends_on = [helm_release.emissary_crds]
}

chipkent avatar Nov 04 '25 19:11 chipkent