gateway
gateway copied to clipboard
OOM killed
Description:
watch some HTTPRoute that has some error. maybe the service does not exist. the EG has been killed due to Reconcile
them.
Logs:
IMO, should set RequeueAfter
to requeue
I did not reproduce it, can you provide the steps to reproduce it ? @qicz
It could be that we are missing a valid error return on getting those resources
I did not reproduce it, can you provide the steps to reproduce it ? @qicz
one HTTPRoute with service that does not exist
Tried that and just HTTPRoute reported BackendNotFound, the eg works still well
Tried that and just HTTPRoute reported BackendNotFound, the eg works still well
this report too often and there are more invalid HTTPRoute, the EG has been killed due to Reconcile them.
@qicz I'm facing same error here.
But My usage is setting about ~1300 HTTPRoute
CR with about ~20 Gateway
with mergeGateway=true
.
May be it's not non-exists
backends cause eg oom, but the count of gateway api crs cause eg oom, I'm facing that deployment envoy-gateway
pod eats too many memory.
default eg memory limit is 1g, you can change this to unlimited, but the problem is still the problem.
@qicz in your logs, can you please paste the entire log showing the namespace and name of service, along with kubectl
info on the service as well the httproute that is linking to it ?
@qicz in your logs, can you please paste the entire log showing the namespace and name of service, along with
kubectl
info on the service as well the httproute that is linking to it ?
@arkodg sorry reply slowly. the namespace and service are from my company app, so they have been cleared by me. sorry for this.
@qicz I'm facing same error here. But My usage is setting about ~1300
HTTPRoute
CR with about ~20Gateway
withmergeGateway=true
.May be it's not
non-exists
backends cause eg oom, but the count of gateway api crs cause eg oom, I'm facing that deploymentenvoy-gateway
pod eats too many memory.default eg memory limit is 1g, you can change this to unlimited, but the problem is still the problem.
in my case, there are only ~30 HTTPRoute
. but can not set the memory to unlimited, it is bad for the Kubernetes cluster
in my case, there are only ~30
HTTPRoute
. but can not set the memory to unlimited, it is bad for the Kubernetes cluster
No need to be unlimited, but some thing larger for routes is enough. But as always, it must be some thing wrong with oom here.
@qicz @zzjin can you outline steps to reproduce the problem, from this chat its hard to understand what the trigger is
@qicz @zzjin can you outline steps to reproduce the problem, from this chat its hard to understand what the trigger is
The analysis concludes that the OOM problem is that there are many secrets and the MEM limit is not set properly.
suggestion: using protobuf connect to Kubernetes to optimize the mem. xref #1596
@qicz @zzjin can you outline steps to reproduce the problem, from this chat its hard to understand what the trigger is
The analysis concludes that the OOM problem is that there are many secrets and the MEM limit is not set properly.
May be that's the problem, our cluster we have about ~3000 ingress with https,witch means about ~3000 secrets.
@qicz can you share mem stats of EG before & after https://github.com/envoyproxy/gateway/pull/1596 ?
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
closing due to no response, please reopen if you hit this issue again
Hi @arkodg, I've hit the same issue.
It seems like the envoy gateway is creating infinite HTTPRoutes for the HTTP01 challenge, while the challenge is not satisfied. My (unproven) theory is that it is provisioning the HTTPRoute
resource with generate_name
instead of using a predictable name, and this causes an infinite reconciliation loop.
EDIT: by looking at the HTTPRoute owner references, this now looks like a cert-manager issue
thanks for debugging this one @miguelvr , cross linking the cert-manager issue here https://github.com/cert-manager/cert-manager/issues/7176
@envoyproxy/gateway-maintainers should we consider something like envoy's overload manager where we stop reconciling more resources (flag this in a GatewayClass
status) in case we hit some specified memory threshold ?