apisix icon indicating copy to clipboard operation
apisix copied to clipboard

bug: kafka-logger, always log `not found broker` after kafka-logger kafka broker up. until reload apisix.

Open guxingke opened this issue 3 years ago • 17 comments

Issue description

always log not found broker after kafka-logger kafka broker up. until reload apisix.

Environment

  • apisix version (cmd: apisix version): 2.10.1
  • OS (cmd: uname -a):Linux test 3.10.0-957.el7.x86_64 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • OpenResty / Nginx version (cmd: nginx -V or openresty -V): nginx version: openresty/1.19.3.2 built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) built with OpenSSL 1.1.1l 24 Aug 2021 TLS SNI support enabled configure arguments: --prefix=/duitang/dist/sys/openresty/nginx --with-cc-opt='-O2 -DAPISIX_BASE_VER=0.0.0 -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl111/include' --add-module=../ngx_devel_kit-0.3.1 --add-module=../echo-nginx-module-0.62 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../srcache-nginx-module-0.32 --add-module=../ngx_lua-0.10.19 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../ngx_stream_lua-0.0.9 --with-ld-opt='-Wl,-rpath,/duitang/dist/sys/openresty/luajit/lib -Wl,-rpath,-L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl111/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl111/lib' --add-module=/duitang/dist/opt/builder/output/openresty-1.19.3.2/../ngx_multi_upstream_module --add-module=/duitang/dist/opt/builder/output/openresty-1.19.3.2/../apisix-nginx-module --add-module=/duitang/dist/opt/builder/output/openresty-1.19.3.2/../lua-var-nginx-module --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --with-http_ssl_module
  • etcd version, if have (cmd: run curl http://127.0.0.1:9090/v1/server_info to get the info from server-info API):{"version":"2.10.1","boot_time":1641439475,"etcd_version":"3.5.0","up_time":83420,"last_report_time":1641522886,"id":"8d53dda0-5ec2-44b6-872c-70f18b0397a8","hostname":"wvr-ngxweb-0-20.duitang.net"}
  • apisix-dashboard version, if have:
  • the plugin runner version, if the issue is about a plugin runner (cmd: depended on the kind of runner):
  • luarocks version, if the issue is about installation (cmd: luarocks --version):

Steps to reproduce

  1. use kafka logger
  2. make karka borker down
  3. until apisix error log apprear like 'not found borker' logs.
  4. make kafka borker up
  5. apisix not auto recover

Actual result

after kafka borker up , send logs.

Error log

buffered messages send to kafka err: not found broker, retryable: true, ....

Expected result

notings

guxingke avatar Jan 06 '22 10:01 guxingke

provide reproduction steps and environment & version information.

tzssangglass avatar Jan 06 '22 13:01 tzssangglass

provide reproduction steps and environment & version information.

description edited

guxingke avatar Jan 07 '22 02:01 guxingke

~~I need a step by step replication of how you build kafka, how you configure route, the Kafka-logger plugin, etc.~~

tzssangglass avatar Jan 07 '22 07:01 tzssangglass

  1. make karka borker down

how to do this?

  1. make kafka borker up

Is kafka broken definitely available at this point?

In my experience, the kafka producer has a lrucache, so you can update the kafka-logger configuration with the cluster_name parameter to generate a new kafka producer?

tzssangglass avatar Jan 07 '22 07:01 tzssangglass

the kafka broken in other case .


when i restart kafka borkers , and waiting the apisix send message auto recover. but not , about 5 minute later , relaod the apisix, it works well.


In my experience, the kafka producer has a lrucache, so you can update the kafka-logger configuration with the cluster_name parameter to generate a new kafka producer?

Maybe in the next time, I will try this solution.

guxingke avatar Jan 07 '22 08:01 guxingke

when i restart kafka borkers , and waiting the apisix send message auto recover. but not , about 5 minute later , relaod the apisix, it works well.

refer to: https://github.com/doujiang24/lua-resty-kafka/issues/70

maybe we need to add refresh_interval to skip this bug.

@guxingke If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned?

tzssangglass avatar Jan 07 '22 18:01 tzssangglass

If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned?

yes, this can.

maybe we need to add refresh_interval to skip this bug.

yeah, It's best if it can recover automatically

guxingke avatar Jan 08 '22 02:01 guxingke

If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned

@guxingke I was unable to reproduce the scenario with this on the master branch.

Here is my configuration of route:

{
    "plugins": {
        "kafka-logger": {
            "broker_list": {
                "127.0.0.1": 9092
            },
            "kafka_topic": "apisix_traffic_logs",
            "key": "key1",
            "timeout": 1,
            "batch_max_size": 1
        }
    },
    "upstream": {
        "nodes": {
            "127.0.0.1:1980": 1
        },
        "type": "roundrobin"
    },
    "uri": "/hello"
}

Here is my exception log:

2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: apisix_traffic_logs, partition_id: 0, length: 1, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080

When I restarted the kafka container, it continued to push messages to kafka normally again.

Note: I did not make the kafka volume, and the restarted kafka is a new cluster.

tzssangglass avatar Jan 10 '22 05:01 tzssangglass

my kafka logger configuration

{
  "batch_max_size": 100,
  "broker_list": {
    "10.200.32.69": 9092,
    "10.200.32.70": 9092,
    "10.200.32.71": 9092
  },
  "disable": false,
  "kafka_topic": "nginx_www"
}

and it as a global plugin.

and it use user-define log_format.


Maybe we didn't repeating it because our configuration was different.

guxingke avatar Jan 11 '22 07:01 guxingke

Maybe we didn't repeating it because our configuration was different.

I need to be able to simulate the kafka broker down de scenario you mentioned.

tzssangglass avatar Jan 11 '22 09:01 tzssangglass

maybe your kafka recover too quickly? I could not provider more information about this .😂

guxingke avatar Jan 11 '22 10:01 guxingke

maybe your kafka recover too quickly? I could not provider more information about this .😂

I got connection refused, I need to be able to reproduce this exception accurately in order to fix it.

tzssangglass avatar Jan 11 '22 13:01 tzssangglass

Retrieved the logs and found both errors. the logs like below.

2022/01/06 16:05:12 [error] 32298#32298: *518806498 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: nginx_www, partition_id: 1, lengt
h: 1, context: ngx.timer, client: y, server: x
2022/01/06 16:05:12 [error] 32296#32296: *518806551 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: nginx_www, partition_id: 1, length: 1, context: ngx.timer, client: y, server: x
2022/01/06 16:05:13 [error] 32299#32299: *518806730 [lua] producer.lua:272: buffered messages send to kafka err: not found broker, retryable: true, topic: nginx_www, partition_id: 1, length: 1, context: ngx.timer, client: y server: x
2022/01/06 16:05:13 [error] 32294#32294: *518806744 [lua] producer.lua:272: buffered messages send to kafka err: not found broker, retryable: true, topic: nginx_www, partition_id: 1, length: 1, context: ngx.timer, client: y, server:x

and the first time for connection refused at '16:04:44', the other at '16:05:13' .

guxingke avatar Jan 12 '22 02:01 guxingke

I don't know how to reproduce not found broker 😂

tzssangglass avatar Jan 12 '22 03:01 tzssangglass

2022/04/11 10:48:19 [error] 46#46: *680816936 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.62, server: 0.0.0.0:9080 2022/04/11 10:48:19 [error] 50#50: *680817019 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.39, server: 0.0.0.0:9080 2022/04/11 10:48:19 [error] 50#50: *680817019 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.39, server: 0.0.0.0:9080 2022/04/11 10:48:19 [error] 50#50: *680817019 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.39, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 44#44: *680817160 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.33, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 44#44: *680817160 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.33, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 44#44: *680817160 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.33, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080 2022/04/11 10:48:21 [error] 46#46: *680817452 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.62, server: 0.0.0.0:9080 2022/04/11 10:48:21 [error] 46#

kennethToyuan avatar Apr 11 '22 02:04 kennethToyuan

we also suffered such thing [lua] producer.lua:272: buffered messages send to kafka err: not found broker, retryable: true,

jujiale avatar Sep 21 '22 03:09 jujiale

@kennethToyuan @jujiale pls provide full reproduction steps, it looks like this is a mandatory bug on your environment.

tzssangglass avatar Sep 21 '22 14:09 tzssangglass

2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080

Does the topic exist in the kafka cluster?

tokers avatar Sep 22 '22 09:09 tokers

thans to your reply. I'm sure I add the tpoic and the topic is exist in kafka. what more is I find in https://github.com/doujiang24/lua-resty-kafka/issues?q=buffered+messages+send+to+kafka+err%3A+not+found+broker there is other people who suffered such situation as me. so I guess problem is in lua-resty-kafka.

jujiale avatar Sep 26 '22 01:09 jujiale

If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned

@guxingke I was unable to reproduce the scenario with this on the master branch.

Here is my configuration of route:

{
    "plugins": {
        "kafka-logger": {
            "broker_list": {
                "127.0.0.1": 9092
            },
            "kafka_topic": "apisix_traffic_logs",
            "key": "key1",
            "timeout": 1,
            "batch_max_size": 1
        }
    },
    "upstream": {
        "nodes": {
            "127.0.0.1:1980": 1
        },
        "type": "roundrobin"
    },
    "uri": "/hello"
}

Here is my exception log:

2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: apisix_traffic_logs, partition_id: 0, length: 1, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080

When I restarted the kafka container, it continued to push messages to kafka normally again.

Note: I did not make the kafka volume, and the restarted kafka is a new cluster.

I find when use kafka cluster model(such as 3 nodes). it occurs such problem. but single node seems ok.

jujiale avatar Jan 30 '23 06:01 jujiale

2024/06/26 09:43:38 [error] 49#49: *7790154 [lua] client.lua:210: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 172.23.8.161, server: 0.0.0.0:9080
2024/06/26 09:43:39 [error] 49#49: *7790154 [lua] client.lua:210: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 172.23.8.161, server: 0.0.0.0:9080
2024-06-26T17:43:39.074091868+08:00 2024/06/26 09:43:39 [error] 49#49: *7790154 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: apigw.logs.http-access, partition_id: -1, length: 1, context: ngx.timer, client: 172.23.8.161, server: 0.0.0.0:9080

Same issue, at first it works, then after a while (no config changed) and I try to call API again, the above errors are shown and it is unable to write to Kafka.

APISIX running on K8s with 1 APISIX, 3 etcd Kafka is a Single broker testing node and requires SASL_PLAN auth, admin super user is used and the topic exists.

After a while it worked again, but the calls during error are not logged.

Zeeeeta avatar Jun 26 '24 09:06 Zeeeeta