apisix-ingress-controller
apisix-ingress-controller copied to clipboard
bug: The performance of ingress-controller's event handling
Issue description
After I create 3000+ apisixroute objects in the cluster, when the apisix-ingress-controller is started or restarting(OOM Maybe), due to the rate limit of client-go, the resource synchronization time is very long, so the changes of endpoints within this time will not be affected synchronize in time, causing 502 problems

Similar to the above question, it seems that the changes of endpoints cannot be synchronized to apisix in a very timely manner. I guess it is because the link of this control loop is too long, watchEndpoints -> translate -> apisix-admin-api -> etcd
To sum up, this kind of performance definitely cannot be put into production. I think it is better for apisix to do service discovery by itself? https://github.com/apache/apisix/pull/4880
Environment
- your apisix-ingress-controller version (output of apisix-ingress-controller version --long): 2.11.0
- your Kubernetes cluster version (output of kubectl version): .1.18.8
- if you run apisix-ingress-controller in Bare-metal environment, also show your OS version (uname -a):
Minimal test code / Steps to reproduce
Actual result
Error log
Expected result
Endpoints can be watched in time
Maybe I’ve said too much, I just think it’s better if the service discovery is done by apisix itself
Thanks for your report.
- your apisix-ingress-controller version (output of apisix-ingress-controller version --long): 2.11.0
What's your apisix-ingress-controller version? The latest version is v1.4 (not release)
The problem you encountered is somewhat similar to #806 and https://github.com/apache/apisix-ingress-controller/pull/760
before #706 , we using workqueue.AddRateLimited, this will cause some problems.
This bug is due to a workqueue shared under the same resource, and a ratelimit mechanism is added to this workqueue, but we only need to add the ratelimit when retrying fails, and when normal resource changes, we should immediately add the workqueue to be processed .
Yes, before #706 we indeed have this issue.
Thanks for your report.
- your apisix-ingress-controller version (output of apisix-ingress-controller version --long): 2.11.0
What's your apisix-ingress-controller version? The latest version is v1.4 (not release)
The problem you encountered is somewhat similar to #806 and #760
It's v1.4,I incorrectly provided the version of apisix
Thanks for your report.
- your apisix-ingress-controller version (output of apisix-ingress-controller version --long): 2.11.0
What's your apisix-ingress-controller version? The latest version is v1.4 (not release) The problem you encountered is somewhat similar to #806 and #760
It's v1.4,I incorrectly provided the version of apisix
How did you install ingress-controller? using helm?
I want to know if you would consider combining apisix-ingress-controller with this way: https://github.com/apache/apisix/pull/4880
Thanks for your report.
- your apisix-ingress-controller version (output of apisix-ingress-controller version --long): 2.11.0
What's your apisix-ingress-controller version? The latest version is v1.4 (not release) The problem you encountered is somewhat similar to #806 and #760
It's v1.4,I incorrectly provided the version of apisix
How did you install ingress-controller? using helm?
Yes, I make my own helm chart, because at that time the official only supported 1.3.0 at the highest
I want to know if you would consider combining apisix-ingress-controller with this way: apache/apisix#4880
It has not been put into the current roadmap.
Can we make an online meeting? I want to know the specific problems you are currently encountering and your thoughts.
Thanks for your report.
- your apisix-ingress-controller version (output of apisix-ingress-controller version --long): 2.11.0
What's your apisix-ingress-controller version? The latest version is v1.4 (not release) The problem you encountered is somewhat similar to #806 and #760
It's v1.4,I incorrectly provided the version of apisix
How did you install ingress-controller? using helm?
Yes, I make my own helm chart, because at that time the official only supported 1.3.0 at the highest
Please help confirm whether your own ingress-controller image contains bugfix #760.
I want to know if you would consider combining apisix-ingress-controller with this way: apache/apisix#4880
It has not been put into the current roadmap.
Can we make an online meeting? I want to know the specific problems you are currently encountering and your thoughts.
Okay, why not make an appointment next week, so I can summarize it briefly.
Okay, why not make an appointment next week, so I can summarize it briefly.
Sure. Due to the holiday, how about we make an appointment next Tuesday 14:00? Or other time you have free.
Okay, why not make an appointment next week, so I can summarize it briefly.
Sure. Due to the holiday, how about we make an appointment next Tuesday 14:00? Or other time you have free.
Emailed you.How about 5pm, I had other arrangements earlier.
Okay, why not make an appointment next week, so I can summarize it briefly.
Sure. Due to the holiday, how about we make an appointment next Tuesday 14:00? Or other time you have free.
Emailed you.How about 5pm, I had other arrangements earlier.
ok.
After discussing with @crazyMonkey1995 , he is currently encountering the following problems:
-
He encountered some 502 errors during rolling updates of a large number of instances. (No health check is configured) The main focus here is that the endpoint update is not fast enough.
-
I think there are two pieces of information that need attention.
- The health check is very helpful for Apache APISIX to remove nodes in time;
- In #760, we have fixed the usage of workqueue and no longer limit the flow, so that the endpoint can be updated more quickly
-
action item:
- Perform stress testing to cover this scenario. @tao12345666333
-
-
The problem of APISIX Ingress controller resource limiting.
- #760 It can solve this problem and has been released in v1.4.
-
In the single-instance APISIX scenario, APISIX Ingress controller cannot re-establish a connection with the dead apisix
- https://github.com/apache/apisix-ingress-controller/pull/774 It can solve this problem and has been released in v1.4.
https://github.com/apache/apisix-ingress-controller/pull/760#issuecomment-1005503358
This fix cannot solve the problem in some scenarios, for example: Suppose there are two instances of apisix-ingress-controller. When the leader goes down for some reason and another instance becomes the leader, the new leader will block the resource event due to "client-side throttling" during the list resource phase. Because apisix-ingress-controller has not completed the list stage at this time, the control loop will definitely be blocked
@crazyMonkey1995 I have modified this title, all of which can be considered to be related to the efficiency of the APISIX Ingress controller for event processing.
We have entered the v1.5 release window, you can use the latest code or wait until v1.5 is released to test and verify.
UPDATE: Tested with latest code(commit reference: dfcbaac8f2b8c9c5ece12e3454fa57a2a23dba65):
- There are 50 replicas of endpoints
- Use ab for 100 concurrent requests
- rollout restart the deployment (resulting in rolling update of endpoints)
- Multiple experiments did not reproduce the performance problem of endpoints update
thanks for the update.
so can we consider this issue fixed and close it?
thanks for the update.
so can we consider this issue fixed and close it?
Okay.
