apisix icon indicating copy to clipboard operation
apisix copied to clipboard

bug: 503 errors occur after restarting APISIX when using Consul and APISIX both in Docker

Open caijwei opened this issue 5 months ago • 19 comments

Current Behavior

Restarting APISIX while it is handling continuous requests leads to intermittent 503 Service Unavailable responses.

I have prepared a minimal reproducible setup. The file structure looks like this:

├── apisix
│   ├── apisix.yaml
│   └── config.yaml
├── nginx
│   └── default.conf
├── init.sh
├── restart.sh
└── test.sh

Once unpacked, the issue can be reproduced easily on any machine that has Docker and curl installed.

  1. Run init.sh to launch Consul and the upstream NGINX container that APISIX will route traffic to.
  2. Run restart.sh to start APISIX.
  3. Run test.sh, which sends continuous requests to APISIX. At first, all responses should return HTTP status code 200.
  4. While test.sh is running, repeatedly execute restart.sh to restart APISIX.
  5. After restarting, even when the new APISIX instance is up and running normally, you will still see intermittent 503 errors in the output of test.sh.

This indicates that APISIX fails to route traffic correctly after a restart, despite the upstream and service discovery being healthy.

Expected Behavior

After restarting APISIX, once it has successfully started and is healthy, it should be able to route traffic to upstream services registered in Consul without returning 503 errors.

Error Logs

When test.sh starts to show 503 responses, you can check the APISIX container logs to see detailed error messages. These logs provide insight into why the requests failed, such as upstream discovery or routing issues.

Steps to Reproduce

  1. Unpack the minimal test setup files.
  2. Run: ./init.sh
  3. Start APISIX: ./restart.sh
  4. Start testing: ./test.sh
  5. In a separate terminal, repeatedly run: ./restart.sh
  6. Watch test.sh output — it will gradually show 503 errors.

Environment

apisix-debug.zip

caijwei avatar Jul 03 '25 11:07 caijwei

After testing, you can clean up all related containers and the network created during this run by executing the following command: docker ps -a --format '{{.Names}}' | grep '^apisix-debug-' | xargs -r docker rm -f && docker network rm apisix-debug

caijwei avatar Jul 03 '25 11:07 caijwei

@caijwei , yes , it can reproduce. some work process always 503, some work process always 200. I think some work process can not callback discovery_consul_callback . Maybe events:register failed or skip . I try to add some code to trace it .

hanqingwu avatar Jul 04 '25 07:07 hanqingwu

@caijwei, Can you try to add these params in apisix/config.yaml:

  events:                             # Event distribution module configuration
    module: lua-resty-events          # Sets the name of the events module used.

Also get good status 200.

hanqingwu avatar Jul 04 '25 07:07 hanqingwu

@hanqingwu I’d like to provide some additional test details:

The host machine is a MacBook Pro with an M3 chip (Apple Silicon), running APISIX inside Docker. Without explicitly setting worker_processes, the default value is auto, and it starts 8 worker processes as expected.

Here’s the confirmation:

root@4cb79f205ed5:/usr/local/apisix# cat /usr/local/apisix/conf/nginx.conf | grep worker_processes
worker_processes auto;
root@4cb79f205ed5:/usr/local/apisix# ps aux | grep 'nginx: worker process' | grep -v grep | wc -l
8
root@4cb79f205ed5:/usr/local/apisix# lscpu
Architecture:                         aarch64
CPU op-mode(s):                       64-bit
Byte Order:                           Little Endian
CPU(s):                               8
On-line CPU(s) list:                  0-7
Thread(s) per core:                   1
Core(s) per socket:                   8
Socket(s):                            1
Vendor ID:                            0x61
Model:                                0
Stepping:                             0x0
BogoMIPS:                             48.00
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; __user pointer sanitization
Vulnerability Spectre v2:             Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected
Flags:                                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint

As per your suggestion, I added the lua-resty-events module configuration to config.yaml and explicitly set worker_processes: 8.

Unfortunately, neither change had any effect. After running restart.sh several times, intermittent 503 errors still occur quite easily.

caijwei avatar Jul 04 '25 07:07 caijwei

Can you try to reduce worker_processes to 4 ?

hanqingwu avatar Jul 04 '25 08:07 hanqingwu

After setting worker_processes to 4, the first few restarts all returned 200 as expected, but on the 13th time I ran ./restart.sh, a 503 error occurred again.

caijwei avatar Jul 04 '25 08:07 caijwei

After adding events.module and setting worker_processes to 4, 503 errors still occasionally occur, and they follow a consistent pattern.

During the execution of restart.sh, test.sh is continuously running. If the output of test.sh at the moment of restart is 000, then all responses after the restart are consistently 200. However, if the output is 503 at the time of the restart, intermittent 503 errors will continue to appear even after the restart completes.

This behavior does not occur with the original version of the ZIP file I provided earlier — in that version, the output at the moment of restart is always 000, regardless of when the restart is triggered, and 503 errors tend to appear more easily and frequently.

caijwei avatar Jul 04 '25 08:07 caijwei

Yes , I think root cause is that events:register not thread safe . when multi work process concurrent call events:register, some register info will overload by other work process ,so this work process can not get events:post . discovery_consul_callback can not be called . this work process always 503 happen.

So , If your env is not high concurrency , set worker_processes to 1 should fix this temporarily. Final solution need be discussed.

hanqingwu avatar Jul 07 '25 02:07 hanqingwu

Hi @caijwei, I can't reproduce this problem with the script and steps you provided. I set worker_processes to 8, and repeated restart.sh about 10 times. After each execution, a 503 error appeared, but it didn't appear after that.

Baoyuantop avatar Jul 08 '25 07:07 Baoyuantop

I have the same problem. I changed event module from lua-resty-events to lua-resty-worker-events. Its working for me.

But Who can explain this?

Lensual avatar Jul 21 '25 05:07 Lensual

Hi @Lensual, can you reproduce this problem stably?

Baoyuantop avatar Jul 21 '25 07:07 Baoyuantop

Hi @Lensual, can you reproduce this problem stably?

Yes, I can reproduce with worker_processes > 1 and event module lua-resty-events.

Memory Dump API GET /v1/discovery/consul/dump response is also unstable.

Sometime return empty services instances.

root@dev:/data/apisix# curl 127.0.0.1:9090/v1/discovery/consul/dump
{"config":{"servers":["http://192.168.16.57:8500"],"token":"","fetch_interval":3,"weight":1,"keepalive":true,"sort_type":"origin","timeout":{"read":2000,"connect":2000,"wait":60}},"services":{"my-service-http":[{"weight":1,"port":10003,"host":"192.168.16.21"}],"my-service-grpc":[{"weight":1,"port":10005,"host":"192.168.16.21"}],"my-service-management":[{"weight":1,"port":10004,"host":"192.168.16.21"}],"my-service-xxxxxxxxx-management":[{"weight":1,"port":10001,"host":"192.168.16.21"}],"my-service-xxxxxxxxx-http":[{"weight":1,"port":10000,"host":"192.168.16.21"}],"my-service-xxxxxxxxx-grpc":[{"weight":1,"port":10002,"host":"192.168.16.21"}]}}
root@dev:/data/apisix# curl 127.0.0.1:9090/v1/discovery/consul/dump
{"config":{"servers":["http://192.168.16.57:8500"],"token":"","fetch_interval":3,"weight":1,"keepalive":true,"sort_type":"origin","timeout":{"read":2000,"connect":2000,"wait":60}},"services":{}}
root@dev:/data/apisix# curl 127.0.0.1:9090/v1/discovery/consul/dump
{"config":{"servers":["http://192.168.16.57:8500"],"token":"","fetch_interval":3,"weight":1,"keepalive":true,"sort_type":"origin","timeout":{"read":2000,"connect":2000,"wait":60}},"services":{"my-service-http":[{"weight":1,"port":10003,"host":"192.168.16.21"}],"my-service-xxxxxxxxx-grpc":[{"weight":1,"port":10002,"host":"192.168.16.21"}],"my-service-grpc":[{"weight":1,"port":10005,"host":"192.168.16.21"}],"my-service-xxxxxxxxx-http":[{"weight":1,"port":10000,"host":"192.168.16.21"}],"my-service-management":[{"weight":1,"port":10004,"host":"192.168.16.21"}],"my-service-xxxxxxxxx-management":[{"weight":1,"port":10001,"host":"192.168.16.21"}]}}

Lensual avatar Jul 22 '25 08:07 Lensual

docker-compose.yml

services:
  apisix:
    image: apache/apisix:3.13.0-debian
    restart: unless-stopped
    network_mode: host
    volumes:
      - ./apisix/conf/config.yaml:/usr/local/apisix/conf/config.yaml:ro
      - ./apisix/logs:/usr/local/apisix/logs:rw

config.yaml

# FROM: https://github.com/apache/apisix/blob/release/3.13/conf/config.yaml.example

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# CAUTION: DO NOT MODIFY DEFAULT CONFIGURATIONS IN THIS FILE.
# Keep the custom configurations in conf/config.yaml.
#

apisix:
  # node_listen: 9080          # APISIX listening port.
  node_listen:                 # APISIX listening ports.
    - 9080
  #   - port: 9081
  #   - ip: 127.0.0.2          # If not set, default to `0.0.0.0`
  #     port: 9082
  enable_admin: true           # Admin API
  enable_dev_mode: false       # If true, set nginx `worker_processes` to 1.
  enable_reuseport: true       # If true, enable nginx SO_REUSEPORT option.
  show_upstream_status_in_response_header: false  # If true, include the upstream HTTP status code in
                                                  # the response header `X-APISIX-Upstream-Status`.
                                                  # If false, show `X-APISIX-Upstream-Status` only if
                                                  # the upstream response code is 5xx.
  enable_ipv6: true
  enable_http2: true

  # proxy_protocol:                    # PROXY Protocol configuration
  #   listen_http_port: 9181           # APISIX listening port for HTTP traffic with PROXY protocol.
  #   listen_https_port: 9182          # APISIX listening port for HTTPS traffic with PROXY protocol.
  #   enable_tcp_pp: true              # Enable the PROXY protocol when stream_proxy.tcp is set.
  #   enable_tcp_pp_to_upstream: true  # Enable the PROXY protocol.

  enable_server_tokens: true           # If true, show APISIX version in the `Server` response header.
  extra_lua_path: ""                   # Extend lua_package_path to load third-party code.
  extra_lua_cpath: ""                  # Extend lua_package_cpath to load third-party code.
  # lua_module_hook: "my_project.my_hook"  # Hook module used to inject third-party code into APISIX.

  proxy_cache:      # Proxy Caching configuration
    cache_ttl: 10s  # The default caching time on disk if the upstream does not specify a caching time.
    zones:
      - name: disk_cache_one    # Name of the cache.
        memory_size: 50m        # Size of the memory to store the cache index.
        disk_size: 1G           # Size of the disk to store the cache data.
        disk_path: /tmp/disk_cache_one  # Path to the cache file for disk cache.
        cache_levels: "1:2"               # Cache hierarchy levels of disk cache.
      # - name: disk_cache_two
      #  memory_size: 50m
      #  disk_size: 1G
      #  disk_path: "/tmp/disk_cache_two"
      #  cache_levels: "1:2"
      - name: memory_cache
        memory_size: 50m

  delete_uri_tail_slash: false        # Delete the '/' at the end of the URI
  normalize_uri_like_servlet: false   # If true, use the same path normalization rules as the Java
                                      # servlet specification. See https://github.com/jakartaee/servlet/blob/master/spec/src/main/asciidoc/servlet-spec-body.adoc#352-uri-path-canonicalization, which is used in Tomcat.

  router:
    http: radixtree_host_uri    # radixtree_host_uri: match route by host and URI
                                # radixtree_uri: match route by URI
                                # radixtree_uri_with_parameter: similar to radixtree_uri but match URI with parameters. See https://github.com/api7/lua-resty-radixtree/#parameters-in-path for more details.
    ssl: radixtree_sni          # radixtree_sni: match route by SNI

  # http is the default proxy mode. proxy_mode can be one of `http`, `stream`, or `http&stream`
  proxy_mode: "http"
  # stream_proxy:                 # TCP/UDP L4 proxy
  #   tcp:
  #     - addr: 9100              # Set the TCP proxy listening ports.
  #       tls: true
  #     - addr: "127.0.0.1:9101"
  #   udp:                        # Set the UDP proxy listening ports.
  #     - 9200
  #     - "127.0.0.1:9201"

  # dns_resolver:                 # If not set, read from `/etc/resolv.conf`
  #   - 1.1.1.1
  #   - 8.8.8.8
  # dns_resolver_valid: 30        # Override the default TTL of the DNS records.
  resolver_timeout: 5             # Set the time in seconds that the server will wait for a response from the
                                  # DNS resolver before timing out.
  enable_resolv_search_opt: true  # If true, use search option in the resolv.conf file in DNS lookups.

  ssl:
    enable: true
    listen:                                       # APISIX listening port for HTTPS traffic.
      - port: 9443
        enable_http3: false                       # Enable HTTP/3 (with QUIC). If not set default to `false`.
      # - ip: 127.0.0.3                           # If not set, default to `0.0.0.0`.
      #   port: 9445
      #   enable_http3: true
    #ssl_trusted_certificate: system              # Specifies a file path with trusted CA certificates in the PEM format. The default value is "system".
    ssl_protocols: TLSv1.2 TLSv1.3                # TLS versions supported.
    ssl_ciphers: ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
    ssl_session_tickets: false  # If true, session tickets are used for SSL/TLS connections.
                                # Disabled by default because it renders Perfect Forward Secrecy (FPS)
                                # useless. See https://github.com/mozilla/server-side-tls/issues/135.

    # fallback_sni: "my.default.domain"      # Fallback SNI to be used if the client does not send SNI during
    #                                        # the handshake.

  enable_control: true  # Control API
  # control:
  #  ip: 127.0.0.1
  #  port: 9090

  disable_sync_configuration_during_start: false  # Safe exit. TO BE REMOVED.

  data_encryption:                # Data encryption settings.
    enable_encrypt_fields: true   # Whether enable encrypt fields specified in `encrypt_fields` in plugin schema.
    keyring:                      # This field is used to encrypt the private key of SSL and the `encrypt_fields`
                                  # in plugin schema.
      - THIS_IS_MY_PASSWORD          # Set the encryption key for AES-128-CBC. It should be a hexadecimal string
                                  # of length 16.
      - THIS_IS_MY_PASSWORD          # If not set, APISIX saves the original data into etcd.
                                  # CAUTION: If you would like to update the key, add the new key as the
                                  # first item in the array and keep the older keys below the newly added
                                  # key, so that data can be decrypted with the older keys and encrypted
                                  # with the new key. Removing the old keys directly can render the data
                                  # unrecoverable.

  events:                             # Event distribution module configuration
    module: lua-resty-events          # Sets the name of the events module used.
    #module: lua-resty-worker-events          # Sets the name of the events module used.
                                      # Supported module: lua-resty-worker-events and lua-resty-events
# status:                       # When enabled, APISIX will provide `/status` and `/status/ready` endpoints
  #   ip: 127.0.0.1               # /status endpoint will return 200 status code if APISIX has successfully started and running correctly
  #   port: 7085                  # /status/ready endpoint will return 503 status code if any of the workers do not receive config from etcd
                                  # or (standalone mode) the config isn't loaded yet either via file or Admin API.
nginx_config:                     # Config for render the template to generate nginx.conf
  # user: root                    # Set the execution user of the worker process. This is only
                                  # effective if the master process runs with super-user privileges.
  error_log: logs/error.log       # Location of the error log.
  error_log_level:  warn          # Logging level: info, debug, notice, warn, error, crit, alert, or emerg.
  worker_processes: auto          # Automatically determine the optimal number of worker processes based
                                  # on the available system resources.
                                  # If you want use multiple cores in container, you can inject the number of
                                  # CPU cores as environment variable "APISIX_WORKER_PROCESSES".
  enable_cpu_affinity: false      # Disable CPU affinity by default as worker_cpu_affinity affects the
                                  # behavior of APISIX in containers. For example, multiple instances could
                                  # be bound to one CPU core, which is not desirable.
                                  # If APISIX is deployed on a physical machine, CPU affinity can be enabled.
  worker_rlimit_nofile: 20480     # The number of files a worker process can open.
                                  # The value should be larger than worker_connections.
  worker_shutdown_timeout: 240s   # Timeout for a graceful shutdown of worker processes.

  max_pending_timers: 16384       # The maximum number of pending timers that can be active at any given time.
                                  # Error "too many pending timers" indicates the threshold is reached.
  max_running_timers: 4096        # The maximum number of running timers that can be active at any given time.
                                  # Error "lua_max_running_timers are not enough" error indicates the
                                  # threshold is reached.

  event:
    worker_connections: 10620

  # envs:                         # Get environment variables.
  #  - TEST_ENV

  meta:
    lua_shared_dict:              # Nginx Lua shared memory zone. Size units are m or k.
      prometheus-metrics: 15m
      standalone-config: 10m

  stream:
    enable_access_log: false                 # Enable stream proxy access logging.
    access_log: logs/access_stream.log       # Location of the stream access log.
    access_log_format: |
      "$remote_addr [$time_local] $protocol $status $bytes_sent $bytes_received $session_time" # Customize log format: http://nginx.org/en/docs/varindex.html
    access_log_format_escape: default        # Escape default or json characters in variables.
    lua_shared_dict:                         # Nginx Lua shared memory zone. Size units are m or k.
      etcd-cluster-health-check-stream: 10m
      lrucache-lock-stream: 10m
      plugin-limit-conn-stream: 10m
      worker-events-stream: 10m
      tars-stream: 1m
      upstream-healthcheck-stream: 10m

  # Add other custom Nginx configurations.
  # Users are responsible for validating the custom configurations
  # to ensure they are not in conflict with APISIX configurations.
  main_configuration_snippet: |
    # Add custom Nginx main configuration to nginx.conf.
    # The configuration should be well indented!
  http_configuration_snippet: |
    # Add custom Nginx http configuration to nginx.conf.
    # The configuration should be well indented!
  http_server_configuration_snippet: |
    # Add custom Nginx http server configuration to nginx.conf.
    # The configuration should be well indented!
  http_server_location_configuration_snippet: |
    # Add custom Nginx http server location configuration to nginx.conf.
    # The configuration should be well indented!
  http_admin_configuration_snippet: |
    # Add custom Nginx admin server configuration to nginx.conf.
    # The configuration should be well indented!
  http_end_configuration_snippet: |
    # Add custom Nginx http end configuration to nginx.conf.
    # The configuration should be well indented!
  stream_configuration_snippet: |
    # Add custom Nginx stream configuration to nginx.conf.
    # The configuration should be well indented!

  http:
    enable_access_log: true             # Enable HTTP proxy access logging.
    access_log: logs/access.log         # Location of the access log.
    access_log_buffer: 16384            # buffer size of access log.
    access_log_format: |
      "$remote_addr - $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\""
    # Customize log format: http://nginx.org/en/docs/varindex.html
    access_log_format_escape: default   # Escape default or json characters in variables.
    keepalive_timeout: 60s              # Set the maximum time for which TCP connection keeps alive.
    client_header_timeout: 60s          # Set the maximum time waiting for client to send the entire HTTP
                                        # request header before closing the connection.
    client_body_timeout: 60s            # Set the maximum time waiting for client to send the request body.
    client_max_body_size: 0             # Set the maximum allowed size of the client request body.
                                        # Default to 0, unlimited.
                                        # Unlike Nginx, APISIX does not limit the body size by default.
                                        # If exceeded, the 413 (Request Entity Too Large) error is returned.
    send_timeout: 10s   # Set the maximum time for transmitting a response to the client before closing.
    underscores_in_headers: "on"  # Allow HTTP request headers to contain underscores in their names.
    real_ip_header: X-Real-IP     # https://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
    real_ip_recursive: "off" # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_recursive
    real_ip_from:            # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
      - 127.0.0.1
      - "unix:"

    # custom_lua_shared_dict:     # Custom Nginx Lua shared memory zone for nginx.conf. Size units are m or k.
    #  ipc_shared_dict: 100m      # Custom shared cache, format: `cache-key: cache-size`

    proxy_ssl_server_name: true   # Send the server name in the SNI extension when establishing an SSL/TLS
                                  # connection with the upstream server, allowing the upstream server to
                                  # select the appropriate SSL/TLS certificate and configuration based on
                                  # the requested server name.

    upstream:
      keepalive: 320              # Set the maximum time of keep-alive connections to the upstream servers.
                                  # When the value is exceeded, the least recently used connection is closed.
      keepalive_requests: 1000    # Set the maximum number of requests that can be served through one
                                  # keep-alive connection.
                                  # After the maximum number of requests is made, the connection is closed.
      keepalive_timeout: 60s      # Set the maximum time for which TCP connection keeps alive.
    charset: utf-8                # Add the charset to the "Content-Type" response header field.
                                  # See http://nginx.org/en/docs/http/ngx_http_charset_module.html#charset
    variables_hash_max_size: 2048 # Set the maximum size of the variables hash table.

    lua_shared_dict:              # Nginx Lua shared memory zone. Size units are m or k.
      internal-status: 10m
      plugin-limit-req: 10m
      plugin-limit-count: 10m
      prometheus-metrics: 10m     # In production, less than 50m is recommended
      plugin-limit-conn: 10m
      upstream-healthcheck: 10m
      worker-events: 10m
      lrucache-lock: 10m
      balancer-ewma: 10m
      balancer-ewma-locks: 10m
      balancer-ewma-last-touched-at: 10m
      plugin-limit-req-redis-cluster-slot-lock: 1m
      plugin-limit-count-redis-cluster-slot-lock: 1m
      plugin-limit-conn-redis-cluster-slot-lock: 1m
      tracing_buffer: 10m
      plugin-api-breaker: 10m
      etcd-cluster-health-check: 10m
      discovery: 1m
      jwks: 1m
      introspection: 10m
      access-tokens: 1m
      ext-plugin: 1m
      tars: 1m
      cas-auth: 10m
      ocsp-stapling: 10m
      mcp-session: 10m

discovery:                      # Service Discovery
#  dns:
#    servers:
#      - "127.0.0.1:8600"         # Replace with the address of your DNS server.
#    resolv_conf: /etc/resolv.conf # Replace with the path to the local DNS resolv config. Configure either "servers" or "resolv_conf".
#    order:                       # Resolve DNS records this order.
#      - last                     # Try the latest successful type for a hostname.
#      - SRV
#      - A
#      - AAAA
#      - CNAME
#  eureka:                        # Eureka
#    host:                        # Eureka address(es)
#      - "http://127.0.0.1:8761"
#    prefix: /eureka/
#    fetch_interval: 30           # Default 30s
#    weight: 100                  # Default weight for node
#    timeout:
#      connect: 2000              # Default 2000ms
#      send: 2000                 # Default 2000ms
#      read: 5000                 # Default 5000ms
#  nacos:                         # Nacos
#    host:                        # Nacos address(es)
#      - "http://${username}:${password}@${host1}:${port1}"
#    prefix: "/nacos/v1/"
#    fetch_interval: 30    # Default 30s
# `weight` is the `default_weight` that will be attached to each discovered node that
# doesn't have a weight explicitly provided in nacos results
#    weight: 100           # Default 100.
#    timeout:
#      connect: 2000       # Default 2000ms
#      send: 2000          # Default 2000ms
#      read: 5000          # Default 5000ms
#    access_key: ""        # Nacos AccessKey ID in Alibaba Cloud, notice that it's for Nacos instances on Microservices Engine (MSE)
#    secret_key: ""        # Nacos AccessKey Secret in Alibaba Cloud, notice that it's for Nacos instances on Microservices Engine (MSE)
#  consul_kv:              # Consul KV
#    servers:              # Consul KV address(es)
#      - "http://127.0.0.1:8500"
#      - "http://127.0.0.1:8600"
#    prefix: "upstreams"
#    skip_keys:                     # Skip special keys
#      - "upstreams/unused_api/"
#    timeout:
#      connect: 2000                # Default 2000ms
#      read: 2000                   # Default 2000ms
#      wait: 60                     # Default 60s
#    weight: 1                      # Default 1
#    fetch_interval: 3              # Default 3s. Effective only when keepalive is false.
#    keepalive: true                # Default to true. Use long pull to query Consul.
#    default_server:                # Define default server to route traffic to.
#      host: "127.0.0.1"
#      port: 20999
#      metadata:
#        fail_timeout: 1            # Default 1ms
#        weight: 1                  # Default 1
#        max_fails: 1               # Default 1
#    dump:                          # Dump the Consul key-value (KV) store to a file.
#       path: "logs/consul_kv.dump" # Location of the dump file.
#       expire: 2592000             # Specify the expiration time of the dump file in units of seconds.
  consul:                          # Consul
    servers:                       # Consul address(es)
      - "http://192.168.16.57:8500"
#      - "http://127.0.0.1:8600"
#    skip_services:                 # Skip services during service discovery.
#      - "service_a"
#    timeout:
#      connect: 2000                # Default 2000ms
#      read: 2000                   # Default 2000ms
#      wait: 60                     # Default 60s
#    weight: 1                      # Default 1
#    fetch_interval: 3              # Default 3s. Effective only when keepalive is false.
#    keepalive: true                # Default to true. Use long pull to query Consul.
#    default_service:               # Define the default service to route traffic to.
#      host: "127.0.0.1"
#      port: 20999
#      metadata:
#        fail_timeout: 1           # Default 1ms
#        weight: 1                 # Default 1
#        max_fails: 1              # Default 1
#    dump:                           # Dump the Consul key-value (KV) store to a file.
#       path: "logs/consul.dump"  # Location of the dump file.
#       expire: 2592000              # Specify the expiration time of the dump file in units of seconds.
#       load_on_init: true           # Default true, load the consul dump file on init
#  kubernetes:                     # Kubernetes service discovery
#    ### kubernetes service discovery both support single-cluster and multi-cluster mode
#    ### applicable to the case where the service is distributed in a single or multiple kubernetes clusters.
#    ### single-cluster mode ###
#    service:
#      schema: https                     # apiserver schema, options [http, https], default https
#      host: ${KUBERNETES_SERVICE_HOST}  # apiserver host, options [ipv4, ipv6, domain, environment variable], default ${KUBERNETES_SERVICE_HOST}
#      port: ${KUBERNETES_SERVICE_PORT}  # apiserver port, options [port number, environment variable], default ${KUBERNETES_SERVICE_PORT}
#    client:
#      # serviceaccount token or path of serviceaccount token_file
#      token_file: ${KUBERNETES_CLIENT_TOKEN_FILE}
#      # token: |-
#       # eyJhbGciOiJSUzI1NiIsImtpZCI6Ikx5ME1DNWdnbmhQNkZCNlZYMXBsT3pYU3BBS2swYzBPSkN3ZnBESGpkUEEif
#       # 6Ikx5ME1DNWdnbmhQNkZCNlZYMXBsT3pYU3BBS2swYzBPSkN3ZnBESGpkUEEifeyJhbGciOiJSUzI1NiIsImtpZCI
#    # kubernetes discovery plugin support use namespace_selector
#    # you can use one of [equal, not_equal, match, not_match] filter namespace
#    namespace_selector:
#      # only save endpoints with namespace equal default
#      equal: default
#      # only save endpoints with namespace not equal default
#      #not_equal: default
#      # only save endpoints with namespace match one of [default, ^my-[a-z]+$]
#      #match:
#      #- default
#      #- ^my-[a-z]+$
#      # only save endpoints with namespace not match one of [default, ^my-[a-z]+$ ]
#      #not_match:
#      #- default
#      #- ^my-[a-z]+$
#    # kubernetes discovery plugin support use label_selector
#    # for the expression of label_selector, please refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
#    label_selector: |-
#      first="a",second="b"
#    # reserved lua shared memory size,1m memory can store about 1000 pieces of endpoint
#    shared_size: 1m #default 1m
#    ### single-cluster mode ###
#    ### multi-cluster mode ###
#  - id: release  # a custom name refer to the cluster, pattern ^[a-z0-9]{1,8}
#    service:
#      schema: https                     # apiserver schema, options [http, https], default https
#      host: ${KUBERNETES_SERVICE_HOST}  # apiserver host, options [ipv4, ipv6, domain, environment variable]
#      port: ${KUBERNETES_SERVICE_PORT}  # apiserver port, options [port number, environment variable]
#    client:
#      # serviceaccount token or path of serviceaccount token_file
#      token_file: ${KUBERNETES_CLIENT_TOKEN_FILE}
#      # token: |-
#       # eyJhbGciOiJSUzI1NiIsImtpZCI6Ikx5ME1DNWdnbmhQNkZCNlZYMXBsT3pYU3BBS2swYzBPSkN3ZnBESGpkUEEif
#       # 6Ikx5ME1DNWdnbmhQNkZCNlZYMXBsT3pYU3BBS2swYzBPSkN3ZnBESGpkUEEifeyJhbGciOiJSUzI1NiIsImtpZCI
#    # kubernetes discovery plugin support use namespace_selector
#    # you can use one of [equal, not_equal, match, not_match] filter namespace
#    namespace_selector:
#      # only save endpoints with namespace equal default
#      equal: default
#      # only save endpoints with namespace not equal default
#      #not_equal: default
#      # only save endpoints with namespace match one of [default, ^my-[a-z]+$]
#      #match:
#      #- default
#      #- ^my-[a-z]+$
#      # only save endpoints with namespace not match one of [default, ^my-[a-z]+$ ]
#      #not_match:
#      #- default
#      #- ^my-[a-z]+$
#    # kubernetes discovery plugin support use label_selector
#    # for the expression of label_selector, please refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
#    label_selector: |-
#      first="a",second="b"
#    # reserved lua shared memory size,1m memory can store about 1000 pieces of endpoint
#    shared_size: 1m #default 1m
#    ### multi-cluster mode ###

graphql:
  max_size: 1048576                # Set the maximum size limitation of graphql in bytes. Default to 1MiB.

# ext-plugin:
#   cmd: ["ls", "-l"]

plugins:                           # plugin list (sorted by priority)
  - real-ip                        # priority: 23000
#  - ai                             # priority: 22900
  - client-control                 # priority: 22000
  - proxy-control                  # priority: 21990
  - request-id                     # priority: 12015
#  - zipkin                         # priority: 12011
  #- skywalking                    # priority: 12010
  #- opentelemetry                 # priority: 12009
#  - ext-plugin-pre-req             # priority: 12000
#  - fault-injection                # priority: 11000
#  - mocking                        # priority: 10900
#  - serverless-pre-function        # priority: 10000
  #- batch-requests                # priority: 4010
  - cors                           # priority: 4000
  - ip-restriction                 # priority: 3000
  - ua-restriction                 # priority: 2999
  - referer-restriction            # priority: 2990
  - csrf                           # priority: 2980
#  - uri-blocker                    # priority: 2900
#  - request-validation             # priority: 2800
#  - chaitin-waf                    # priority: 2700
#  - multi-auth                     # priority: 2600
#  - openid-connect                 # priority: 2599
#  - cas-auth                       # priority: 2597
#  - authz-casbin                   # priority: 2560
#  - authz-casdoor                  # priority: 2559
#  - wolf-rbac                      # priority: 2555
#  - ldap-auth                      # priority: 2540
  - hmac-auth                      # priority: 2530
  - basic-auth                     # priority: 2520
#  - jwt-auth                       # priority: 2510
#  - jwe-decrypt                    # priority: 2509
#  - key-auth                       # priority: 2500
#  - consumer-restriction           # priority: 2400
#  - attach-consumer-label          # priority: 2399
#  - forward-auth                   # priority: 2002
#  - opa                            # priority: 2001
#  - authz-keycloak                 # priority: 2000
  #- error-log-logger              # priority: 1091
  - proxy-cache                    # priority: 1085
  - body-transformer               # priority: 1080
#  - ai-prompt-template             # priority: 1071
#  - ai-prompt-decorator            # priority: 1070
#  - ai-prompt-guard                # priority: 1072
#  - ai-rag                         # priority: 1060
#  - ai-rate-limiting               # priority: 1030
#  - ai-aws-content-moderation      # priority: 1040 TODO: compare priority with other ai plugins
  - proxy-mirror                   # priority: 1010
  - proxy-rewrite                  # priority: 1008
#  - workflow                       # priority: 1006
#  - api-breaker                    # priority: 1005
  - limit-conn                     # priority: 1003
  - limit-count                    # priority: 1002
  - limit-req                      # priority: 1001
  #- node-status                   # priority: 1000
#  - ai-proxy                       # priority: 999
#  - ai-proxy-multi                 # priority: 998
  #- brotli                        # priority: 996
  - gzip                           # priority: 995
  #- server-info                    # priority: 990
#  - traffic-split                  # priority: 966
#  - redirect                       # priority: 900
#  - response-rewrite               # priority: 899
#  - mcp-bridge                     # priority: 510
#  - degraphql                      # priority: 509
#  - kafka-proxy                    # priority: 508
  #- dubbo-proxy                   # priority: 507
#  - grpc-transcode                 # priority: 506
#  - grpc-web                       # priority: 505
#  - http-dubbo                     # priority: 504
#  - public-api                     # priority: 501
  - prometheus                     # priority: 500
#  - datadog                        # priority: 495
#  - lago                           # priority: 415
#  - loki-logger                    # priority: 414
#  - elasticsearch-logger           # priority: 413
#  - echo                           # priority: 412
#  - loggly                         # priority: 411
#  - http-logger                    # priority: 410
#  - splunk-hec-logging             # priority: 409
#  - skywalking-logger              # priority: 408
#  - google-cloud-logging           # priority: 407
#  - sls-logger                     # priority: 406
#  - tcp-logger                     # priority: 405
#  - kafka-logger                   # priority: 403
#  - rocketmq-logger                # priority: 402
#  - syslog                         # priority: 401
#  - udp-logger                     # priority: 400
#  - file-logger                    # priority: 399
#  - clickhouse-logger              # priority: 398
#  - tencent-cloud-cls              # priority: 397
#  - inspect                        # priority: 200
  #- log-rotate                    # priority: 100
  # <- recommend to use priority (0, 100) for your custom plugins
#  - example-plugin                 # priority: 0
  #- gm                            # priority: -43
  #- ocsp-stapling                 # priority: -44
#  - aws-lambda                     # priority: -1899
#  - azure-functions                # priority: -1900
#  - openwhisk                      # priority: -1901
#  - openfunction                   # priority: -1902
#  - serverless-post-function       # priority: -2000
#  - ext-plugin-post-req            # priority: -3000
#  - ext-plugin-post-resp           # priority: -4000

stream_plugins:                    # stream plugin list (sorted by priority)
  - ip-restriction                 # priority: 3000
  - limit-conn                     # priority: 1003
  - mqtt-proxy                     # priority: 1000
  #- prometheus                    # priority: 500
  - syslog                         # priority: 401
  # <- recommend to use priority (0, 100) for your custom plugins


# wasm:
#   plugins:
#     - name: wasm_log
#       priority: 7999
#       file: t/wasm/log/main.go.wasm

# xrpc:
#   protocols:
#     - name: pingpong
plugin_attr:          # Plugin attributes
  log-rotate:         # Plugin: log-rotate
    timeout: 10000    # maximum wait time for a log rotation(unit: millisecond)
    interval: 3600    # Set the log rotate interval in seconds.
    max_kept: 168     # Set the maximum number of log files to keep. If exceeded, historic logs are deleted.
    max_size: -1      # Set the maximum size of log files in bytes before a rotation.
                      # Skip size check if max_size is less than 0.
    enable_compression: false    # Enable log file compression (gzip).
  skywalking:                                     # Plugin: skywalking
    service_name: APISIX                          # Set the service name for SkyWalking reporter.
    service_instance_name: APISIX Instance Name   # Set the service instance name for SkyWalking reporter.
    endpoint_addr: http://127.0.0.1:12800         # Set the SkyWalking HTTP endpoint.
    report_interval: 3                            # Set the reporting interval in second.
  opentelemetry:      # Plugin: opentelemetry
    trace_id_source: x-request-id   # Specify the source of the trace ID for OpenTelemetry traces.
    resource:
      service.name: APISIX          # Set the service name for OpenTelemetry traces.
    collector:
      address: 127.0.0.1:4318       # Set the address of the OpenTelemetry collector to send traces to.
      request_timeout: 3            # Set the timeout for requests to the OpenTelemetry collector in seconds.
      request_headers:              # Set the headers to include in requests to the OpenTelemetry collector.
        Authorization: token        # Set the authorization header to include an access token.
    batch_span_processor:
      drop_on_queue_full: false     # Drop spans when the export queue is full.
      max_queue_size: 1024          # Set the maximum size of the span export queue.
      batch_timeout: 2              # Set the timeout for span batches to wait in the export queue before
                                    # being sent.
      inactive_timeout: 1           # Set the timeout for spans to wait in the export queue before being sent,
                                    # if the queue is not full.
      max_export_batch_size: 16     # Set the maximum number of spans to include in each batch sent to the
                                    # OpenTelemetry collector.
    set_ngx_var: false              # Export opentelemetry variables to NGINX variables.
  prometheus:                               # Plugin: prometheus
    export_uri: /apisix/prometheus/metrics  # Set the URI for the Prometheus metrics endpoint.
    metric_prefix: apisix_                  # Set the prefix for Prometheus metrics generated by APISIX.
    enable_export_server: true              # Enable the Prometheus export server.
    export_addr:                            # Set the address for the Prometheus export server.
      ip: 127.0.0.1                         # Set the IP.
      port: 9091                            # Set the port.
    # metrics:    # Create extra labels from nginx variables: https://nginx.org/en/docs/varindex.html
    #  http_status:
    #    expire: 0 # The expiration time after which metrics are removed. unit: second.
    #              # 0 means the metrics will not expire
    #    extra_labels:
    #      - upstream_addr: $upstream_addr
    #      - status: $upstream_status  # The label name does not need to be the same as the variable name.
    #  http_latency:
    #    expire: 0 # The expiration time after which metrics are removed. unit: second.
    #              # 0 means the metrics will not expire
    #    extra_labels:
    #      - upstream_addr: $upstream_addr
    #  bandwidth:
    #    expire: 0 # The expiration time after which metrics are removed. unit: second.
    #              # 0 means the metrics will not expire
    #    extra_labels:
    #      - upstream_addr: $upstream_addr
    #  upstream_status:
    #    expire: 0 # The expiration time after which metrics are removed. unit: second.
    # default_buckets:
    #   - 10
    #   - 50
    #   - 100
    #   - 200
    #   - 500
  server-info:                        # Plugin: server-info
    report_ttl: 60                    # Set the TTL in seconds for server info in etcd.
                                      # Maximum: 86400. Minimum: 3.
  dubbo-proxy:                        # Plugin: dubbo-proxy
    upstream_multiplex_count: 32      # Set the maximum number of connections that can be multiplexed over
                                      # a single network connection between the Dubbo Proxy and the upstream
                                      # Dubbo services.
  proxy-mirror:                       # Plugin: proxy-mirror
    timeout:                          # Set the timeout for mirrored requests.
      connect: 60s
      read: 60s
      send: 60s
  # redirect:                         # Plugin: redirect
  #   https_port: 8443                # Set the default port used to redirect HTTP to HTTPS.
  inspect:                            # Plugin: inspect
    delay: 3                          # Set the delay in seconds for the frequency of checking the hooks file.
    hooks_file: "/usr/local/apisix/plugin_inspect_hooks.lua"  # Set the path to the Lua file that defines
                                                              # hooks. Only administrators should have
                                                              # write access to this file for security.
  zipkin:                             # Plugin: zipkin
    set_ngx_var: false                # export zipkin variables to nginx variables

deployment:                    # Deployment configurations
  role: traditional            # Set deployment mode: traditional, control_plane, or data_plane.
  role_traditional:
    config_provider: etcd      # Set the configuration center.

  #role_data_plane:            # Set data plane details if role is data_plane.
  #  config_provider: etcd     # Set the configuration center: etcd, xds, or yaml.

  #role_control_plane:         # Set control plane details if role is control_plane.
  #  config_provider: etcd     # Set the configuration center.

  admin:                       # Admin API
    admin_key_required: true   # Enable Admin API authentication by default for security.
    admin_key:
      -
        name: admin                             # admin: write access to configurations.
        key: 'THIS_IS_MY_PASSWORD'   # Set API key for the admin of Admin API.
        role: admin
      # -
      #   name: viewer                            # viewer: read-only to configurations.
      #   key: 4054f7cf07e344346cd3f287985e76a2   # Set API key for the viewer of Admin API.
      #   role: viewer

    enable_admin_cors: true       # Enable Admin API CORS response header `Access-Control-Allow-Origin`.
    enable_admin_ui: true         # Enable embedded APISIX Dashboard UI.
    allow_admin:                  # Limit Admin API access by IP addresses.
      - 127.0.0.0/24              # If not set, any IP address is allowed.
      - 192.168.16.0/24
      - 192.168.17.0/24
      # - "::/64"
    admin_listen:                 # Set the Admin API listening addresses.
      ip: 0.0.0.0                 # Set listening IP.
      port: 9180                  # Set listening port. Beware of port conflict with node_listen.

    # https_admin: true           # Enable SSL for Admin API on IP and port specified in admin_listen.
                                  # Use admin_api_mtls.admin_ssl_cert and admin_api_mtls.admin_ssl_cert_key.
    # admin_api_mtls:             # Set this if `https_admin` is true.
    #   admin_ssl_cert: ""        # Set path to SSL/TLS certificate.
    #   admin_ssl_cert_key: ""    # Set path to SSL/TLS key.
    #   admin_ssl_ca_cert: ""     # Set path to CA certificate used to sign client certificates.

    admin_api_version: v3         # Set the version of Admin API (latest: v3).

  etcd:
    host:                         # Set etcd address(es) in the same etcd cluster.
      - "http://192.168.16.57:2379"   # If TLS is enabled for etcd, use https://127.0.0.1:2379.
    prefix: /apisix               # Set etcd prefix.
    timeout: 30                   # The timeout when connect/read/write to etcd, Set timeout in seconds.
    watch_timeout: 50             # The timeout when watch etcd
    # resync_delay: 5             # Set resync time in seconds after a sync failure.
                                  # The actual resync time would be resync_delay plus 50% random jitter.
    # health_check_timeout: 10    # Set timeout in seconds for etcd health check.
                                  # Default to 10 if not set or a negative value is provided.
    startup_retry: 2              # Set the number of retries to etcd on startup. Default to 2.
    # user: root                  # Set the root username for etcd.
    # password: 5tHkHhYkjr6cQ     # Set the root password for etcd.
    tls:
      # cert: /path/to/cert       # Set the path to certificate used by the etcd client
      # key: /path/to/key         # Set the path to path of key used by the etcd client
      verify: true                # Verify the etcd certificate when establishing a TLS connection with etcd.
      # sni:                      # The SNI for etcd TLS requests.
                                  # If not set, the host from the URL is used.

Lensual avatar Jul 22 '25 09:07 Lensual

Hi @Lensual , What 's your test device cpu core number ?

hanqingwu avatar Jul 22 '25 11:07 hanqingwu

Hi @Lensual , What 's your test device cpu core number ?

@hanqingwu

32 cores.

The test docker host is a LXC container on PVE.

root@dev:/data/apisix# lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   32
  On-line CPU(s) list:    0-31
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
    CPU family:           6
    Model:                62
    Thread(s) per core:   2
    Core(s) per socket:   8
    Socket(s):            2
    Stepping:             4
    CPU max MHz:          3400.0000
    CPU min MHz:          1200.0000
    BogoMIPS:             5199.97
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopolog
                          y nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault 
                          epb ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features:  
  Virtualization:         VT-x
Caches (sum of all):      
  L1d:                    512 KiB (16 instances)
  L1i:                    512 KiB (16 instances)
  L2:                     4 MiB (16 instances)
  L3:                     40 MiB (2 instances)
NUMA:                     
  NUMA node(s):           2
  NUMA node0 CPU(s):      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
  NUMA node1 CPU(s):      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
Vulnerabilities:          
  Gather data sampling:   Not affected
  Itlb multihit:          KVM: Vulnerable
  L1tf:                   Mitigation; PTE Inversion; VMX vulnerable
  Mds:                    Vulnerable; SMT vulnerable
  Meltdown:               Vulnerable
  Mmio stale data:        Unknown: No mitigations
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Vulnerable
  Spectre v1:             Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
  Spectre v2:             Vulnerable; IBPB: disabled; STIBP: disabled; PBRSB-eIBRS: Not affected; BHI: Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected


root@dev:/data/apisix# uname -a
Linux dev 6.8.12-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-5 (2024-12-03T10:26Z) x86_64 x86_64 x86_64 GNU/Linux

Lensual avatar Jul 24 '25 01:07 Lensual

We have indeed found this issue, and we are waiting for the community members to fix it.

SkyeYoung avatar Aug 11 '25 02:08 SkyeYoung

lua-resty-events

how to change the event module ?

lchpersonal avatar Aug 14 '25 05:08 lchpersonal

@lchpersonal https://github.com/apache/apisix/issues/12398#issuecomment-3034833966

SkyeYoung avatar Aug 14 '25 06:08 SkyeYoung

After investigation, it was confirmed that the problem was caused by the event module, and we plan to replace the event module with a shared dict.

Baoyuantop avatar Nov 19 '25 06:11 Baoyuantop