apisix icon indicating copy to clipboard operation
apisix copied to clipboard

bug: upstream in route domain ip changes not been detected and use the old ip

Open wklken opened this issue 5 months ago • 14 comments

Current Behavior

In some condition, when the ip of the domain changed, the apisix keep use the old ip, cause 504 gateway timeout.

And it would never resume, until do apisix reload

At the same time, dig and nslookup command return the newest ip.

Expected Behavior

apisix should detect the ip changed

Error Logs

2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] upstream.lua:65: parse_domain_for_nodes(): parse_domain_for_nodes: [{"weight":100,"host":"10.105.226.135","domain":"httpbin","priority":1,"upstream_host":"httpbin","port":80}], client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"
2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] upstream.lua:69: parse_domain_for_nodes(): parse_domain_for_nodes: host=10.105.226.135, client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"
2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] upstream.lua:84: parse_domain_for_nodes(): parse_domain_for_nodes: add the node back, client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"
2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] init.lua:213: parse_domain_in_route(): parse_domain_in_route | new_nodes=[{"weight":100,"host":"10.105.226.135","domain":"httpbin","priority":1,"upstream_host":"httpbin","port":80}], client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"
2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] init.lua:219: parse_domain_in_route(): parse_domain_in_route | up_conf:{"timeout":{"send":30,"connect":30,"read":30},"hash_on":"vars","type":"roundrobin","parent":{"update_count":0,"modifiedIndex":5360,"orig_modifiedIndex":5360,"clean_handlers":{},"createdIndex":5360,"has_domain":true,"key":"/bk-gateway-apisix/routes/apigw.prod.2347","value":{"timeout":{"send":30,"connect":30,"read":30},"desc":"Returns anything passed in request data.","name":"apigw-prod-anything-get","labels":{"gateway.bk.tencent.com/stage":"prod","gateway.bk.tencent.com/gateway":"apigw"},"update_time":1752566944,"plugins":{"bk-proxy-rewrite":{"match_subpath":false,"uri":"/anything","subpath_param_name":":ext","method":"GET","use_real_request_uri_unsafe":false},"bk-resource-context":{"bk_resource_name":"anything_get","bk_resource_id":2347,"bk_resource_auth":{"verified_user_required":false,"resource_perm_required":false,"skip_user_verification":false,"verified_app_required":false},"bk_resource_auth_obj":{"verified_user_required":false,"resource_perm_required":false,"skip_user_verification":false,"verified_app_required":false}}},"uris":["/api/apigw/prod/anything","/api/apigw/prod/anything/"],"upstream":{"timeout":"table: 0x7f119b810dd0","hash_on":"vars","type":"roundrobin","parent":"table: 0x7f1199322a98","original_nodes":[{"weight":100,"host":"10.105.226.135","domain":"httpbin","priority":1,"upstream_host":"httpbin","port":80}],"nodes":"table: 0x7f11693587e0","pass_host":"node","scheme":"http","nodes_ref":"table: 0x7f11693587e0"},"status":1,"id":"apigw.prod.2347","service_id":"apigw.prod.stage-4","priority":0,"methods":["GET"],"create_time":1752566944}},"original_nodes":"table: 0x7f11693587e0","nodes":"table: 0x7f11693587e0","pass_host":"node","scheme":"http","nodes_ref":"table: 0x7f11693587e0"}, client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"
2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] init.lua:221: parse_domain_in_route(): parse_domain_in_route | compare result:true, client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"
2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] init.lua:223: parse_domain_in_route(): parse_domain_in_route | no change, use old route, client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"

Steps to Reproduce

  1. add a route with route.upstream.nodes and the nodes[0].host = httpbin, which is a svc in k8s, route to the httpbin service
$ curl -H "X-API-KEY: $admin_key"  http://127.0.0.1:9180/apisix/admin/routes/apigw.prod.2347 | jq
{
  "key": "/bk-gateway-apisix/routes/apigw.prod.2347",
  "modifiedIndex": 5360,
  "createdIndex": 5360,
  "value": {
    "timeout": {
      "send": 30,
      "connect": 30,
      "read": 30
    },
    "desc": "Returns anything passed in request data.",
    "name": "apigw-prod-anything-get",
    "update_time": 1752566944,
    "plugins": {
      "proxy-rewrite": {
        "method": "GET",
        "uri": "/anything"
      }
    },
    "create_time": 1752566944,
    "upstream": {
      "timeout": {
        "send": 30,
        "connect": 30,
        "read": 30
      },
      "nodes": [
        {
          "weight": 100,
          "priority": 1,
          "port": 80,
          "host": "httpbin"
        }
      ],
      "pass_host": "node",
      "scheme": "http",
      "type": "roundrobin"
    },
    "labels": {
      "gateway.bk.tencent.com/stage": "prod",
      "gateway.bk.tencent.com/gateway": "apigw"
    },
    "id": "apigw.prod.2347",
    "service_id": "apigw.prod.stage-4",
    "status": 1,
    "methods": [
      "GET"
    ],
    "uris": [
      "/api/apigw/prod/anything",
      "/api/apigw/prod/anything/"
    ]
  }
}

here, the route.upstream.nodes[0].host = httpbin`

  1. add core.log.error for debug

apisix/init.lua

local function parse_domain_in_route(route)
    local nodes = route.value.upstream.nodes
    local new_nodes, err = upstream_util.parse_domain_for_nodes(nodes)
    core.log.error("parse_domain_in_route | new_nodes=", core.json.delay_encode(new_nodes, true))
    if not new_nodes then
        return nil, err
    end

    local up_conf = route.dns_value and route.dns_value.upstream
    core.log.error("parse_domain_in_route | up_conf:", core.json.delay_encode(up_conf, true))
    local ok = upstream_util.compare_upstream_node(up_conf, new_nodes)
    core.log.error("parse_domain_in_route | compare result:", ok)
    if ok then
        core.log.error("parse_domain_in_route | no change, use old route")
        return route
    end

    -- don't modify the modifiedIndex to avoid plugin cache miss because of DNS resolve result
    -- has changed

    -- Here we copy the whole route instead of part of it,
    -- so that we can avoid going back from route.value to route during copying.
    route.dns_value = core.table.deepcopy(route).value
    route.dns_value.upstream.nodes = new_nodes
    core.log.info("parse route which contain domain: ",
                  core.json.delay_encode(route, true))
    return route
end

and

apisix/utils/upstream.lua

local function parse_domain_for_nodes(nodes)
    core.log.error("parse_domain_for_nodes: ", core.json.delay_encode(nodes, true))
    local new_nodes = core.table.new(#nodes, 0)
    for _, node in ipairs(nodes) do
        local host = node.host
        core.log.error("parse_domain_for_nodes: host=", host)
        if not ipmatcher.parse_ipv4(host) and
                not ipmatcher.parse_ipv6(host) then
            local ip, err = core.resolver.parse_domain(host)
            if ip then
                local new_node = core.table.clone(node)
                new_node.host = ip
                new_node.domain = host
                core.table.insert(new_nodes, new_node)
            end

            if err then
                core.log.error("dns resolver domain: ", host, " error: ", err)
            end
        else
            core.log.error("parse_domain_for_nodes: add the node back")
            core.table.insert(new_nodes, node)
        end
    end

    return new_nodes
end
_M.parse_domain_for_nodes = parse_domain_for_nodes
  1. apisix reload and update routes in etcd, trigger config_etcd.lua:389: sync_data()
  2. at the same time, delete the httpbin service and kubectl apply it again (the cluster ip would be changed) 【not 100% Reproducible】
  3. curl it

according to the error.log,

  1. the parse_domain-for_nodes args 1 is [{"weight":100,"host":"10.105.226.135","domain":"httpbin","priority":1,"upstream_host":"httpbin","port":80}], the host is a ip here

2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] upstream.lua:65: parse_domain_for_nodes(): parse_domain_for_nodes: [{"weight":100,"host":"10.105.226.135","domain":"httpbin","priority":1,"upstream_host":"httpbin","port":80}], client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"

  1. while it's not a domain, so it would not core.resolver.parse_domain(host)

2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] upstream.lua:69: parse_domain_for_nodes(): parse_domain_for_nodes: host=10.105.226.135, client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"

  1. then it been added back

2025/07/16 09:41:20 [error] 6290#6290: *554164 [lua] upstream.lua:84: parse_domain_for_nodes(): parse_domain_for_nodes: add the node back, client: 10.244.2.240, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.paasv3-dev.example.com"


so the worker would never detect the ip changes, until apisix reload

Environment

  • APISIX version (run apisix version): 3.2.1
  • Operating system (run uname -a):
  • OpenResty / Nginx version (run openresty -V or nginx -V):
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
  • APISIX Dashboard version, if relevant:
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):

wklken avatar Jul 16 '25 09:07 wklken

local function parse_domain_in_route(route)
    local nodes = route.value.upstream.nodes
    local new_nodes, err = upstream_util.parse_domain_for_nodes(nodes)

nodes = route.value.upstream.nodes should be the origin one(host=httpbin), but it seems been assign to the new_nodes(with host=ip, domain=httpbin)

Image

compare to the normal worker, the route.value.upstream has two more key-values: orignal_nodes/nodes_ref and the nodes is a table ref.


the nodes_ref

https://github.com/apache/apisix/blob/3.2.1/apisix/upstream.lua#L240-L241

    up_conf.nodes_ref = filled_nodes
    up_conf.nodes = filled_nodes

so the upstream is been assigned to up_conf somewhere, and the nodes is replaced by the resolved new_nodes

but the up_conf here is

https://github.com/apache/apisix/blob/3.2.1/apisix/upstream.lua#L252

local up_conf = api_ctx.matched_upstream

wklken avatar Jul 16 '25 15:07 wklken

I didn't reproduce this problem in version 3.13.0. Here are my steps:

  1. Deploy apisix and httpbin services
curl -sL https://run.api7.ai/apisix/quickstart | sh

docker run -d --name httpbin -p 8080:80 kennethreitz/httpbin
  1. create route:
curl http://127.0.0.1:9180/apisix/admin/routes/1 -H "X-API-KEY: $admin_key" -X PUT -i -d '
{
    "uri": "/anything",
    "upstream": {
        "type": "roundrobin",
        "nodes": {
            "httpbin:8080": 1
        }
    }
}'
  1. Edit the local host file and add the local address
192.168.31.149  httpbin
  1. Request apisix and return normally
curl -i 127.0.0.1:9080/anything

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 311
Connection: keep-alive
Date: Thu, 17 Jul 2025 08:09:08 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Server: APISIX/3.13.0

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Host": "127.0.0.1:9080", 
    "User-Agent": "curl/8.7.1", 
    "X-Forwarded-Host": "127.0.0.1"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "192.168.97.1", 
  "url": "http://127.0.0.1/anything"
}
  1. Edit the host file to the wrong address
192.168.31.148  httpbin
  1. Request apisix and failed with 504, after changing the address correctly, you can access it normally.

Baoyuantop avatar Jul 17 '25 08:07 Baoyuantop

@Baoyuantop

I found the log below, during create_radixtree_uri_router it insert the route with original_nodes

it means while rebuild the radixtree, it's the wrong route.value.update.modes.

  • change the ip of route.upstream.nodes[1] and at the same time trigger the radixtree rebuilding
2025/07/16 09:28:43 [info] 6290#6290: *503710 [lua] route.lua:73: create_radixtree_uri_router(): insert uri route: {"timeout":{"send":30,"connect":30,"read":30},"desc":"Returns anything passed in request data.","name":"apigw-prod-anything-get","labels":{"gateway.bk.tencent.com/stage":"prod","gateway.bk.tencent.com/gateway":"apigw"},"update_time":1752566944,"plugins":{"bk-proxy-rewrite":{"match_subpath":false,"uri":"/anything","subpath_param_name":":ext","method":"GET","use_real_request_uri_unsafe":false},"bk-resource-context":{"bk_resource_name":"anything_get","bk_resource_id":2347,"bk_resource_auth":{"verified_user_required":false,"resource_perm_required":false,"skip_user_verification":false,"verified_app_required":false},"bk_resource_auth_obj":{"verified_user_required":false,"resource_perm_required":false,"skip_user_verification":false,"verified_app_required":false}}},"uris":["/api/apigw/prod/anything","/api/apigw/prod/anything/"],"upstream":{"timeout":{"send":30,"connect":30,"read":30},"hash_on":"vars","type":"roundrobin","parent":{"update_count":0,"modifiedIndex":5360,"orig_modifiedIndex":5360,"clean_handlers":{},"createdIndex":5360,"has_domain":true,"key":"/bk-gateway-apisix/routes/apigw.prod.2347","value":{"timeout":"table: 0x7f119b810600","desc":"Returns anything passed in request data.","name":"apigw-prod-anything-get","labels":"table: 0x7f119b8105b8","update_time":1752566944,"plugins":"table: 0x7f119b810840","uris":"table: 0x7f119b8107b0","upstream":"table: 0x7f119b810bf0","status":1,"id":"apigw.prod.2347","service_id":"apigw.prod.stage-4","priority":0,"methods":["GET"],"create_time":1752566944}},"original_nodes":[{"weight":100,"host":"10.105.226.135","domain":"httpbin","priority":1,"upstream_host":"httpbin","port":80}],"nodes":"table: 0x7f11693587e0","pass_host":"node","scheme":"http","nodes_ref":"table: 0x7f11693587e0"},"status":1,"id":"apigw.prod.2347","service_id":"apigw.prod.stage-4","priority":0,"methods":"table: 0x7f119b8107f8","create_time":1752566944}, client: 10.1.0.1, server: _, request: "GET /healthz HTTP/1.1", host: "10.1.1.1:6006"

json format

{
  "timeout": {
    "send": 30,
    "connect": 30,
    "read": 30
  },
  "desc": "Returns anything passed in request data.",
  "name": "apigw-prod-anything-get",
  "labels": {
    "gateway.bk.tencent.com/stage": "prod",
    "gateway.bk.tencent.com/gateway": "apigw"
  },
  "update_time": 1752566944,
  "plugins": {
      ......
  },
  "uris": [
    "/api/apigw/prod/anything",
    "/api/apigw/prod/anything/"
  ],
  "upstream": {
    "timeout": {
      "send": 30,
      "connect": 30,
      "read": 30
    },
    "hash_on": "vars",
    "type": "roundrobin",
    "parent": {
      "update_count": 0,
      "modifiedIndex": 5360,
      "orig_modifiedIndex": 5360,
      "clean_handlers": {},
      "createdIndex": 5360,
      "has_domain": true,
      "key": "/bk-gateway-apisix/routes/apigw.prod.2347",
      "value": {
        "timeout": "table: 0x7f119b810600",
        "desc": "Returns anything passed in request data.",
        "name": "apigw-prod-anything-get",
        "labels": "table: 0x7f119b8105b8",
        "update_time": 1752566944,
        "plugins": "table: 0x7f119b810840",
        "uris": "table: 0x7f119b8107b0",
        "upstream": "table: 0x7f119b810bf0",
        "status": 1,
        "id": "apigw.prod.2347",
        "service_id": "apigw.prod.stage-4",
        "priority": 0,
        "methods": [
          "GET"
        ],
        "create_time": 1752566944
      }
    },
    "original_nodes": [
      {
        "weight": 100,
        "host": "10.105.226.135",
        "domain": "httpbin",
        "priority": 1,
        "upstream_host": "httpbin",
        "port": 80
      }
    ],
    "nodes": "table: 0x7f11693587e0",
    "pass_host": "node",
    "scheme": "http",
    "nodes_ref": "table: 0x7f11693587e0"
  },
  "status": 1,
  "id": "apigw.prod.2347",
  "service_id": "apigw.prod.stage-4",
  "priority": 0,
  "methods": "table: 0x7f119b8107f8",
  "create_time": 1752566944
}

wklken avatar Jul 17 '25 09:07 wklken

Hi @wklken, please pay attention to my steps to reproduce the problem. I can't reproduce this problem. Is there something wrong with my steps?

Baoyuantop avatar Jul 18 '25 08:07 Baoyuantop

Hi @wklken, please pay attention to my steps to reproduce the problem. I can't reproduce this problem. Is there something wrong with my steps?

You should update the route config in etcd, trigger the radixtree rebuilding, and the curl would stuck to wait for the rebuilding finished.

I use the k8s svc to delete and apply svc fast enough to change the ip behind the svc(use the k8s dns).


so the upstream is been assigned to up_conf somewhere, and the nodes is replaced by the resolved new_nodes

I can't find the code where the assigning is, and I want to add more core.log, do you have any advice(which module/function)?

wklken avatar Jul 18 '25 08:07 wklken

Image

wklken avatar Jul 20 '25 02:07 wklken

apisix/init.lua

local function parse_domain_in_route(route)
    local nodes = route.value.upstream.nodes
    core.log.error("parse_domain_in_route: route.value.upstream.nodes=", core.json.delay_encode(nodes, true))  -- add log here
    local new_nodes, err = upstream_util.parse_domain_for_nodes(nodes)
    if not new_nodes then
        return nil, err
    end

    local up_conf = route.dns_value and route.dns_value.upstream
    local ok = upstream_util.compare_upstream_node(up_conf, new_nodes)
    if ok then
        return route
    end

    -- don't modify the modifiedIndex to avoid plugin cache miss because of DNS resolve result
    -- has changed

    -- Here we copy the whole route instead of part of it,
    -- so that we can avoid going back from route.value to route during copying.
    route.dns_value = core.table.deepcopy(route).value
    route.dns_value.upstream.nodes = new_nodes
    core.log.info("parse route which contain domain: ",
                  core.json.delay_encode(route, true))
    core.log.error("parse_domain_in_route after parse domain: route.value.upstream=", core.json.delay_encode(route.value.upstream, true))   -- add log here
    core.log.error("parse_domain_in_route after parse domain: route.value.upstream.nodes=", core.json.delay_encode(route.value.upstream.nodes, true))   -- add log here
    return route
end

got the logs

  1. here the route.value.upstream.nodes is original nodes(hots is domain)
2025/07/20 03:41:08 [error] 519#519: *12037 [lua] init.lua:234: parse_domain_in_route(): parse_domain_in_route after parse domain: route.value.upstream.nodes=[{"port":80,"priority":1,"host":"httpbin","weight":100}], client: 10.1.1.1, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.example.com"
  1. here, the new reuqest, the route.value.upstream.nodes is original nodes(hots is domain); but after parse domain it replaced to the parsed nodes (host is ip)
2025/07/20 03:41:45 [error] 519#519: *14537 [lua] init.lua:212: parse_domain_in_route(): parse_domain_in_route: route.value.upstream.nodes=[{"port":80,"priority":1,"host":"httpbin","weight":100}], client: 10.1.1.1, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.example.com"

2025/07/20 03:41:45 [info] 519#519: *14537 [lua] client.lua:123: dns_parse(): dns resolve httpbin, result: {"name":"httpbin.default.svc.cluster.local","section":1,"type":1,"address":"10.100.183.135","class":1,"ttl":30}, client: 10.1.1.1, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.example.com"
2025/07/20 03:41:45 [info] 519#519: *14537 [lua] resolver.lua:84: parse_domain(): parse addr: {"name":"httpbin.default.svc.cluster.local","section":1,"type":1,"ttl":30,"class":1,"address":"10.100.183.135"}, client: 10.1.1.1, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.example.com"
2025/07/20 03:41:45 [info] 519#519: *14537 [lua] resolver.lua:85: parse_domain(): resolver: ["10.96.0.10"], client: 10.1.1.1, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.example.com"
2025/07/20 03:41:45 [info] 519#519: *14537 [lua] resolver.lua:86: parse_domain(): host: httpbin, client: 10.1.1.1, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.example.com"
2025/07/20 03:41:45 [info] 519#519: *14537 [lua] resolver.lua:88: parse_domain(): dns resolver domain: httpbin to 10.100.183.135, client: 10.1.1.1, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.example.com"

-- the dns servre return the old ip for now

2025/07/20 03:41:45 [info] 519#519: *14537 [lua] init.lua:231: parse_domain_in_route(): parse route which contain domain: {"has_domain":true,"key":"/bk-gateway-apisix/services/apigw.prod.stage-4","dns_value":{"name":"apigw-prod-anything-get","upstream":{"parent":{"update_count":0,"key":"/bk-gateway-apisix/routes/apigw.prod.2347","createdIndex":5360,"value":{"name":"apigw-prod-anything-get","priority":0,"methods":["GET"],"labels":{"gateway.bk.tencent.com/gateway":"apigw","gateway.bk.tencent.com/stage":"prod"},"id":"apigw.prod.2347","desc":"Returns anything passed in request data.","create_time":1752566944,"status":1,"timeout":{"read":30,"connect":30,"send":30},"uris":["/api/apigw/prod/anything","/api/apigw/prod/anything/"],"upstream":"table: 0x7f1c8356f638","update_time":1752566944,"plugins":{"bk-proxy-rewrite":{"subpath_param_name":":ext","use_real_request_uri_unsafe":false,"uri":"/anything","match_subpath":false,"method":"GET"},"bk-resource-context":{"bk_resource_id":2347,"bk_resource_name":"anything_get","bk_resource_auth":{"verified_app_required":false,"verified_user_required":false,"resource_perm_required":false,"skip_user_verification":false},"bk_resource_auth_obj":{"verified_app_required":false,"verified_user_required":false,"resource_perm_required":false,"skip_user_verification":false}}},"service_id":"apigw.prod.stage-4"},"modifiedIndex":5360,"orig_modifiedIndex":5360,"has_domain":true,"clean_handlers":{}},"nodes":[{"priority":1,"domain":"httpbin","port":80,"host":"10.100.183.135","weight":100}],"pass_host":"node","type":"roundrobin","scheme":"http","hash_on":"vars","timeout":{"read":30,"connect":30,"send":30}},"desc":"正式环境","id":"apigw.prod.2347","labels":"table: 0x7f1c8356f000","create_time":1752482957,"timeout":"table: 0x7f1c8356f048","update_time":1752566944,"plugins":{"bk-auth-validate":{},"bk-proxy-rewrite":"table: 0x7f1c8356f2d0","bk-delete-cookie":{},"bk-log-context":{},"prometheus":{"prefer_name":false},"bk-real-ip":{},"bk-stage-context":{"jwt_private_key":"

2025/07/20 03:41:45 [error] 519#519: *14537 [lua] init.lua:233: parse_domain_in_route(): parse_domain_in_route after parse domain: route.value.upstream={"parent":{"update_count":0,"key":"/bk-gateway-apisix/routes/apigw.prod.2347","createdIndex":5360,"value":{"name":"apigw-prod-anything-get","priority":0,"methods":["GET"],"labels":{"gateway.bk.tencent.com/gateway":"apigw","gateway.bk.tencent.com/stage":"prod"},"id":"apigw.prod.2347","desc":"Returns anything passed in request data.","create_time":1752566944,"status":1,"timeout":{"read":30,"connect":30,"send":30},"uris":["/api/apigw/prod/anything","/api/apigw/prod/anything/"],"upstream":{"parent":"table: 0x7f1c827a2378","nodes":[{"priority":1,"domain":"httpbin","port":80,"host":"10.100.183.135","weight":100}],"pass_host":"node","type":"roundrobin","scheme":"http","hash_on":"vars","timeout":{"read":30,"connect":30,"send":30}},"update_time":1752566944,"plugins":{"bk-proxy-rewrite":{"subpath_param_name":":ext","use_real_request_uri_unsafe":false,"uri":"/anything","match_subpath":false,"method":"GET"},"bk-resource-context":{"bk_resource_id":2347,"bk_resource_name":"anything_get","bk_resource_auth":{"verified_app_required":false,"verified_user_required":false,"resource_perm_required":false,"skip_user_verification":false},"bk_resource_auth_obj":{"verified_app_required":false,"verified_user_required":false,"resource_perm_required":false,"skip_user_verification":false}}},"service_id":"apigw.prod.stage-4"},"modifiedIndex":5360,"orig_modifiedIndex":5360,"has_domain":true,"clean_handlers":{}},"nodes":"table: 0x7f1c82da69f0","pass_host":"node","type":"roundrobin","scheme":"http","hash_on":"vars","timeout":"table: 0x7f1c8356f818"}, client: 10.1.1.1, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.example.com"
  1. all the following request will use the parsed nodes, and can't detect the change of the domain ip.
2025/07/20 03:41:45 [error] 519#519: *14537 [lua] init.lua:234: parse_domain_in_route(): parse_domain_in_route after parse domain: route.value.upstream.nodes=[{"priority":1,"domain":"httpbin","port":80,"host":"10.100.183.135","weight":100}], client: 10.1.1.1, server: _, request: "GET /api/apigw/prod/anything HTTP/1.1", host: "bkapi.example.com"

wklken avatar Jul 20 '25 04:07 wklken

is this fix related to current issue? https://github.com/apache/apisix/pull/11861

wklken avatar Jul 21 '25 06:07 wklken

You should update the route config in etcd, trigger the radixtree rebuilding, and the curl would stuck to wait for the rebuilding finished.

Can you describe this in detail? I don't understand how to do this.

Baoyuantop avatar Jul 21 '25 06:07 Baoyuantop

I had the same problem in version 3.14.2

chaoxiaodi avatar Nov 20 '25 12:11 chaoxiaodi

I had the same problem in version 3.14.2

@chaoxiaodi can you reproduce it?

We finally use a patch https://github.com/TencentBlueKing/blueking-apigateway-apisix/blob/master/src/build/patches/002_upstream_parse_domain_for_nodes.patch to fix it on production.

wklken avatar Nov 20 '25 14:11 wklken

I had the same problem in version 3.14.2

@chaoxiaodi can you reproduce it?

We finally use a patch https://github.com/TencentBlueKing/blueking-apigateway-apisix/blob/master/src/build/patches/002_upstream_parse_domain_for_nodes.patch to fix it on production.


Why not submit pr to this repo

chaoxiaodi avatar Nov 21 '25 02:11 chaoxiaodi

I had the same problem in version 3.14.2


update : version 3.14.1,not 3.14.2

chaoxiaodi avatar Nov 21 '25 02:11 chaoxiaodi

I had the same problem in version 3.14.2

@chaoxiaodi can you reproduce it? We finally use a patch TencentBlueKing/blueking-apigateway-apisix@master/src/build/patches/002_upstream_parse_domain_for_nodes.patch to fix it on production.

Why not submit pr to this repo

because can't reproduce it 100%, it's hard to reproduce and we need to fix production problem.

wklken avatar Nov 21 '25 08:11 wklken