apisix-ingress-controller
apisix-ingress-controller copied to clipboard
bug: If the admin API responds too slowly, it cannot guarantee the order in which requests are processed
Issue description
在pkg/ingress/endpoint.go文件中,syncEndpoint方法
c.controller.syncEndpoint(ctx, newestEp)
会调用dashboard接口更新upstream ip信息
resp, err := u.cluster.updateResource(ctx, url, "upstream", bytes.NewReader(body))
调用接口默认超时时间为5s,超时后会返回错误信息并立即进行下一次endpoint资源同步 但是对http请求来说,客户端超时后返回后并不会影响服务端的处理,服务端会继续处理该请求 如果下一次同步同一个endpoint资源,并且update upstream响应比较块的话,就有可能造成旧ip列表覆盖新ip列表的情况
Environment
- your apisix-ingress-controller version (output of apisix-ingress-controller version --long):
- your Kubernetes cluster version (output of kubectl version):
- if you run apisix-ingress-controller in Bare-metal environment, also show your OS version (uname -a):
Minimal test code / Steps to reproduce
Actual result
第一次同步endpoint-A,ip列表为ip1、ip2、ip3 如果调用dashboard update upstream接口(记为请求1)超时,客户端会立即返回,并进行下一次同步,此时服务端还会继续处理请求1 第二次同步endpoint-A,ip列表为ip1、ip2 调用dashboard update upstream接口(记为请求2) 由于dashboard并不能保证请求1比请求2先处理完成,所以可能会造成请求1的数据覆盖请求2的数据,导致异常
Error log
none
Expected result
No response
APISIX's admin API used by APISIX Ingress, not dashboard.
Did you actually encounter this problem?
In fact, in APISIX Ingress, there is periodic resynchronization. If the APISIX admin API frequently times out in the user's environment, it is recommended to increase this frequency.
APISIX's admin API used by APISIX Ingress, not dashboard.
Did you actually encounter this problem?
是的,线上环境碰到了这个问题,至于admin api返回慢的原因我还在查
In fact, in APISIX Ingress, there is periodic resynchronization. If the APISIX admin API frequently times out in the user's environment, it is recommended to increase this frequency.
APISIX's admin API used by APISIX Ingress, not dashboard.
Did you actually encounter this problem?
admin API慢的问题定位到了。 admin API请求etcd时每次都会调用etcd.new()方法,这个方法每次都会选择配置文件里的第一个节点连接(issue),导致所有的连接都连到了etcd的一个节点。 我们这个etcd集群节点的max open files设置使用的系统默认值1024,连接多了之后就会响应变慢,调整了max open files之后就好了。
写了好久英文版,写不下去了,还是用中文吧。。
Thanks! I see that the issue has been resolved and the PR has been merged. It should be enabled in newer versions of APISIX (2.15.1
I saw a same question from kong.
https://mp.weixin.qq.com/s/NpbJhqFpQkqIgCOqmCMZRA
https://github.com/Kong/kong/issues/7543
I saw a same question from kong.
https://mp.weixin.qq.com/s/NpbJhqFpQkqIgCOqmCMZRA
May I know how many service/route you have in kong to cause the problem ? We need some infomation to decide kong or apisix . Appreciate for help 🙏
This issue has been marked as stale due to 90 days of inactivity. It will be closed in 30 days if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.