kong icon indicating copy to clipboard operation
kong copied to clipboard

fix upstream_dict nil while updating an upstream

Open locao opened this issue 3 years ago • 5 comments

Summary

This PR was originally submitted by @ranxuxin. I just rebased it to be part of the release 2.3.0, below is their original description:

Hi everyone,

clients requests return 500 while using admin API to change an upstream with "eventual" worker_consistency. Although (#6104) fixes this bug, clients requests return 503 instead of return 500. However, either 500 or 503 are not acceptable on production environment.

The reason is that singletons.core_cache:invalidate_local("balancer:upstreams") and singletons.core_cache:get("balancer:upstreams") are not atomic in do_upstream_event() and update_balancer_state(). When clients requests come between this two steps, kong throws stack traceback. As below:

2020/11/10 14:36:27 [error] 2625#0: *8424 lua entry thread aborted: runtime error: /usr/local/share/lua/5.1/kong/runloop/balancer.lua:672: attempt to index local 'upstreams_dict' (a nil value) stack traceback: coroutine 0: /usr/local/share/lua/5.1/kong/runloop/balancer.lua: in function 'get_upstream_by_name' /usr/local/share/lua/5.1/kong/runloop/balancer.lua:693: in function 'get_balancer' /usr/local/share/lua/5.1/kong/runloop/balancer.lua:1090: in function 'execute' /usr/local/share/lua/5.1/kong/runloop/handler.lua:891: in function 'balancer_execute' /usr/local/share/lua/5.1/kong/runloop/handler.lua:1264: in function 'after' /usr/local/share/lua/5.1/kong/init.lua:788: in function 'access' access_by_lua(nginx-kong.conf:80):2: in main chunk, client: 127.0.0.1, server: kong, request: "GET / HTTP/1.1", host: "www.test500a.com"

2020/11/10 18:17:10 [error] 20512#0: *1783168 [lua] events.lua:194: do_handlerlist(): worker-events: event callback failed; source=balancer, event=upstreams, pid=20513 error='/usr/local/share/lua/5.1/kong/runloop/balancer.lua:519: attempt to index local 'upstream' (a nil value) stack traceback: /usr/local/share/lua/5.1/kong/runloop/balancer.lua:519: in function 'create_balancer' /usr/local/share/lua/5.1/kong/runloop/balancer.lua:1050: in function 'do_upstream_event' /usr/local/share/lua/5.1/kong/runloop/balancer.lua:1066: in function 'on_upstream_event' /usr/local/share/lua/5.1/kong/runloop/handler.lua:406: in function </usr/local/share/lua/5.1/kong/runloop/handler.lua:401> [C]: in function 'xpcall' /usr/local/share/lua/5.1/resty/worker/events.lua:185: in function 'do_handlerlist' /usr/local/share/lua/5.1/resty/worker/events.lua:219: in function 'do_event_json' /usr/local/share/lua/5.1/resty/worker/events.lua:361: in function 'poll' /usr/local/share/lua/5.1/resty/worker/events.lua:380: in function </usr/local/share/lua/5.1/resty/worker/events.lua:375>', data={"operation":"update","entity":{"created_at":1604923597,"hash_on":"none","id":"cee3bfb3-fcd8-4f87-aa7a-2086c361d025","algorithm":"round-robin","name":"test500a","hash_on_cookie_path":"/","healthchecks":{"threshold":0,"active":{"unhealthy":{"http_statuses":[429,404,500,501,502,503,504,505],"tcp_failures":0,"timeouts":0,"http_failures":0,"interval":0},"http_path":"/","timeout":1,"type":"http","healthy":{"successes":0,"interval":0,"http_statuses":[200,302]},"https_verify_certificate":true,"concurrency":10},"passive":{"unhealthy":{"http_failures":0,"http_statuses":[429,500,503],"tcp_failures":0,"timeouts":0},"healthy":{"http_statuses":[200,201,202,203,204,205,206,207,208,226,300,301,302,303,304,305,306,307,308],"successes":0},"type":"http"}},"hash_fallback":"none","slots":2800}}, context: ngx.timer

Full changelog We defined "singletons.core_cache:set()" interface in cache.lua. And used it instead of singletons.core_cache:invalidate_local("balancer:upstreams") and singletons.core_cache:get("balancer:upstreams").

Issues resolved Fixes: #6561

Closes: #6569

locao avatar Jan 12 '21 19:01 locao

Hi @locao,

I pushed the commit to resolve kikito's change request in branch fix/upstream_dict_nil of my repository (https://github.com/ranxuxin/kong.git). Could you help me to merge into this repository, please?

Best Regards, Ran Xuxin

ranxuxin001 avatar Jan 13 '21 07:01 ranxuxin001

@ranxuxin I applied all those suggestions and rebased the PR to make it follow our styling better. Feel free to take a look!

Also added a minor performance improvement by caching the empty opts table instead of creating one and throw away immediately after.

dndx avatar Jan 13 '21 07:01 dndx

@dndx , thank you to apply these suggestions. It looks great.

ranxuxin001 avatar Jan 13 '21 12:01 ranxuxin001

@dndx , it seems the new commit didn't pass checks. Could you help me to fix it, please?

ranxuxin001 avatar Jan 14 '21 03:01 ranxuxin001

@ranxuxin Thanks for your contribution! 👍

@dndx Do you have time to solve the conflict to get this PR merged?

mayocream avatar Dec 22 '21 17:12 mayocream

Closing because of lack of activity. We would like to accept this contribution if anyone wishes to pick this back up.

hbagdi avatar Oct 25 '22 21:10 hbagdi

@hbagdi @locao @dndx Does anyone knows where are we now? I'm revisiting old issues.

StarlightIbuki avatar Oct 11 '23 07:10 StarlightIbuki