apisix-ingress-controller
apisix-ingress-controller copied to clipboard
bug: Enabling the APISIX ingress controller can lead to a surge in CPU usage on the APISIX gateway.
Current Behavior
Enabling the APISIX ingress controller can lead to a surge in CPU usage on the APISIX gateway.
Turning off the APISIX ingress controller will resolve the issue.
The above-mentioned issue can be consistently and reliably reproduced in my cluster. Each time, the memory usage increases for three minutes, then the CPU usage returns to a lower level for three minutes, and after three minutes, it will rise again, continuing for three minutes. It's almost unusable and severely affects operations, with the service delay increasing from 20ms to 2000ms.
Expected Behavior
No response
Error Logs
2023-10-27T11:11:29+08:00 error ingress/ingress.go:148 failed to translate ingress {"error": "endpoints: endpoints "j3t46i" not found", "ingress": {}}
2023-10-27T11:11:29+08:00 warn ingress/ingress.go:268 sync ingress failed, will retry {"object": {"Type":1,"Object":{"Key":"j3t46i/j3t46i","GroupVersion":"networking/v1","OldObject":null},"OldObject":null,"Tombstone":null}, "error": "endpoints: endpoints "j3t46i" not found"}
2023-10-27T11:11:29+08:00 error ingress/ingress.go:472 failed to get APISIX gateway external IPs {"error": "resource name may not be empty"}
2023-10-27T11:11:29+08:00 error translation/translator.go:158 failed to translate ingress backend to upstream {"error": "endpoints: endpoints "dz24qv" not found", "ingress": "&Ingress{ObjectMeta:{dz24qv dz24qv 37023cf1-b564-46cc-8779-77ed62cf901b 295270932 1 2023-10-19 20:31:45 +0800 HKT
Steps to Reproduce
Only my cluster.
Environment
- APISIX Ingress controller version (run
apisix-ingress-controller version --long
) image: apache/apisix-ingress-controller:1.6.0
apisix-ingress-controller
command not found in this docker image.
- Kubernetes cluster version (run
kubectl version
)
kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.6", GitCommit:"ff2c119726cc1f8926fb0585c74b25921e866a28", GitTreeState:"clean", BuildDate:"2023-01-18T19:22:09Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.6", GitCommit:"ff2c119726cc1f8926fb0585c74b25921e866a28", GitTreeState:"clean", BuildDate:"2023-01-18T19:15:26Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
- OS version if running APISIX Ingress controller in a bare-metal environment (run
uname -a
)
uname -a
Linux laf-canary-master001 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
thank your report
what's your apisix-ingress-controller config?
- ingress pod
apiVersion: apps/v1
kind: Deployment
metadata:
name: apisix-ingress-controller
namespace: ingress-apisix
uid: c317229e-1f3a-4d85-b3ae-4be21d756ace
resourceVersion: '216541530'
generation: 1
creationTimestamp: '2023-03-31T02:37:52Z'
labels:
app.kubernetes.io/instance: apisix
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: ingress-controller
app.kubernetes.io/version: 1.6.0
helm.sh/chart: ingress-controller-0.11.4
annotations:
deployment.kubernetes.io/revision: '1'
meta.helm.sh/release-name: apisix
meta.helm.sh/release-namespace: ingress-apisix
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/instance: apisix
app.kubernetes.io/name: ingress-controller
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: apisix
app.kubernetes.io/name: ingress-controller
annotations:
checksum/config: dc595ec92c5fdc9f40170836cf8831cff9c2aeb820a6d590c02912d518747607
spec:
volumes:
- name: configuration
configMap:
name: apisix-configmap
items:
- key: config.yaml
path: config.yaml
defaultMode: 420
initContainers:
- name: wait-apisix-admin
image: busybox:1.28
command:
- sh
- '-c'
- >-
until nc -z apisix-admin.ingress-apisix.svc.cluster.local 9180 ;
do echo waiting for apisix-admin; sleep 2; done;
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext: {}
containers:
- name: ingress-controller
image: apache/apisix-ingress-controller:1.6.0
command:
- /ingress-apisix/apisix-ingress-controller
- ingress
- '--config-path'
- /ingress-apisix/conf/config.yaml
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
resources: {}
volumeMounts:
- name: configuration
mountPath: /ingress-apisix/conf
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: apisix-ingress-controller
serviceAccount: apisix-ingress-controller
securityContext: {}
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
- configMap
apiVersion: v1
kind: ConfigMap
metadata:
name: apisix-configmap
namespace: ingress-apisix
uid: eae5df62-47b2-4a86-b1df-188834f1e397
resourceVersion: '383912'
creationTimestamp: '2023-03-31T02:37:52Z'
labels:
app.kubernetes.io/instance: apisix
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: ingress-controller
app.kubernetes.io/version: 1.6.0
helm.sh/chart: ingress-controller-0.11.4
annotations:
meta.helm.sh/release-name: apisix
meta.helm.sh/release-namespace: ingress-apisix
data:
config.yaml: |-
# log options
log_level: "info"
log_output: "stderr"
cert_file: "/etc/webhook/certs/cert.pem"
key_file: "/etc/webhook/certs/key.pem"
http_listen: ":8080"
https_listen: ":8443"
ingress_publish_service: ""
enable_profiling: true
apisix-resource-sync-interval: 1h
kubernetes:
kubeconfig: ""
resync_interval: "6h"
namespace_selector:
- ""
election_id: "ingress-apisix-leader"
ingress_class: "apisix"
ingress_version: "networking/v1"
watch_endpointslices: false
apisix_route_version: "apisix.apache.org/v2"
enable_gateway_api: false
apisix_version: "apisix.apache.org/v2"
plugin_metadata_cm: ""
apisix:
admin_api_version: "v3"
default_cluster_base_url: http://apisix-admin.ingress-apisix.svc.cluster.local:9180/apisix/admin
default_cluster_admin_key: "xxxx---xxx-xx-x-x-x"
default_cluster_name: "default"
apisix-resource-sync-interval: 1h
is too short for hundreds, even thousands of resources.
I see hundreds of creation events in the 3-min log file. Since the events generated by full synchronization in 1.6 are also creation events, I can't tell if they are sync events or true resource creation events. If they are sync events, the resource sync interval should be increased. Otherwise, CP/DP isolated deployment mode is required.
Also, some auto-renew resources are not excluded correctly. For example, openebs.io-local
raises an update event every 2s from log.
@lingsamuel Do we have a technical solution to reduce the requests to the CP side when endpoint resources are updated?
For example, cache and other solutions?
@maslow
I tried to reproduce this problem on the Alibaba Cloud ACK cluster, created 3500 APISIX-route, APISIX tls route and found no periodic high load.
ping @maslow
This issue has been marked as stale due to 90 days of inactivity. It will be closed in 30 days if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.
We also found the same problem, apisix-ingress-controller when comparing apisix crd declarative configurations with etcd data, requesting apisix would cause apisix cpu resources to go up, and then go down again after a while
2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 799f92e3 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 1668e488 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 2059c8b3 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 97d60823 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 6d49039a in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 47eee264 in APISIX but do not in declare yaml
2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 16cc31 in APISIX but do not in declare yaml
We also found the same problem, apisix-ingress-controller when comparing apisix crd declarative configurations with etcd data, requesting apisix would cause apisix cpu resources to go up, and then go down again after a while
2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 799f92e3 in APISIX but do not in declare yaml 2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 1668e488 in APISIX but do not in declare yaml 2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 2059c8b3 in APISIX but do not in declare yaml 2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 97d60823 in APISIX but do not in declare yaml 2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 6d49039a in APISIX but do not in declare yaml 2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 47eee264 in APISIX but do not in declare yaml 2024-04-26T19:34:15+08:00 warn ingress/compare.go:186 pluginConfig: 16cc31 in APISIX but do not in declare yaml
I'm experiencing the same scenario, please help look at it. Now as soon as I restart apisix-ingress-controller, apisix cpu goes up and then down again, about 5k+ apisixroutes and 5k+ upstreams @shreemaan-abhishek @Sn0rt apisix version: 2.13.0 apisix-ingress-controller version: 1.4.1
This issue has been marked as stale due to 90 days of inactivity. It will be closed in 30 days if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.