apisix
apisix copied to clipboard
bug: kafka-logger, always log `not found broker` after kafka-logger kafka broker up. until reload apisix.
Issue description
always log not found broker after kafka-logger kafka broker up. until reload apisix.
Environment
- apisix version (cmd:
apisix version): 2.10.1 - OS (cmd:
uname -a):Linux test 3.10.0-957.el7.x86_64 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux - OpenResty / Nginx version (cmd:
nginx -Voropenresty -V): nginx version: openresty/1.19.3.2 built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) built with OpenSSL 1.1.1l 24 Aug 2021 TLS SNI support enabled configure arguments: --prefix=/duitang/dist/sys/openresty/nginx --with-cc-opt='-O2 -DAPISIX_BASE_VER=0.0.0 -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl111/include' --add-module=../ngx_devel_kit-0.3.1 --add-module=../echo-nginx-module-0.62 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../srcache-nginx-module-0.32 --add-module=../ngx_lua-0.10.19 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../ngx_stream_lua-0.0.9 --with-ld-opt='-Wl,-rpath,/duitang/dist/sys/openresty/luajit/lib -Wl,-rpath,-L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl111/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl111/lib' --add-module=/duitang/dist/opt/builder/output/openresty-1.19.3.2/../ngx_multi_upstream_module --add-module=/duitang/dist/opt/builder/output/openresty-1.19.3.2/../apisix-nginx-module --add-module=/duitang/dist/opt/builder/output/openresty-1.19.3.2/../lua-var-nginx-module --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --with-http_ssl_module - etcd version, if have (cmd: run
curl http://127.0.0.1:9090/v1/server_infoto get the info from server-info API):{"version":"2.10.1","boot_time":1641439475,"etcd_version":"3.5.0","up_time":83420,"last_report_time":1641522886,"id":"8d53dda0-5ec2-44b6-872c-70f18b0397a8","hostname":"wvr-ngxweb-0-20.duitang.net"} - apisix-dashboard version, if have:
- the plugin runner version, if the issue is about a plugin runner (cmd: depended on the kind of runner):
- luarocks version, if the issue is about installation (cmd:
luarocks --version):
Steps to reproduce
- use kafka logger
- make karka borker down
- until apisix error log apprear like 'not found borker' logs.
- make kafka borker up
- apisix not auto recover
Actual result
after kafka borker up , send logs.
Error log
buffered messages send to kafka err: not found broker, retryable: true, ....
Expected result
notings
provide reproduction steps and environment & version information.
provide reproduction steps and environment & version information.
description edited
~~I need a step by step replication of how you build kafka, how you configure route, the Kafka-logger plugin, etc.~~
- make karka borker down
how to do this?
- make kafka borker up
Is kafka broken definitely available at this point?
In my experience, the kafka producer has a lrucache, so you can update the kafka-logger configuration with the cluster_name parameter to generate a new kafka producer?
the kafka broken in other case .
when i restart kafka borkers , and waiting the apisix send message auto recover. but not , about 5 minute later , relaod the apisix, it works well.
In my experience, the kafka producer has a lrucache, so you can update the kafka-logger configuration with the cluster_name parameter to generate a new kafka producer?
Maybe in the next time, I will try this solution.
when i restart kafka borkers , and waiting the apisix send message auto recover. but not , about 5 minute later , relaod the apisix, it works well.
refer to: https://github.com/doujiang24/lua-resty-kafka/issues/70
maybe we need to add refresh_interval to skip this bug.
@guxingke If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned?
If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned?
yes, this can.
maybe we need to add refresh_interval to skip this bug.
yeah, It's best if it can recover automatically
If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned
@guxingke I was unable to reproduce the scenario with this on the master branch.
Here is my configuration of route:
{
"plugins": {
"kafka-logger": {
"broker_list": {
"127.0.0.1": 9092
},
"kafka_topic": "apisix_traffic_logs",
"key": "key1",
"timeout": 1,
"batch_max_size": 1
}
},
"upstream": {
"nodes": {
"127.0.0.1:1980": 1
},
"type": "roundrobin"
},
"uri": "/hello"
}
Here is my exception log:
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: apisix_traffic_logs, partition_id: 0, length: 1, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
When I restarted the kafka container, it continued to push messages to kafka normally again.
Note: I did not make the kafka volume, and the restarted kafka is a new cluster.
my kafka logger configuration
{
"batch_max_size": 100,
"broker_list": {
"10.200.32.69": 9092,
"10.200.32.70": 9092,
"10.200.32.71": 9092
},
"disable": false,
"kafka_topic": "nginx_www"
}
and it as a global plugin.
and it use user-define log_format.
Maybe we didn't repeating it because our configuration was different.
Maybe we didn't repeating it because our configuration was different.
I need to be able to simulate the kafka broker down de scenario you mentioned.
maybe your kafka recover too quickly? I could not provider more information about this .😂
maybe your kafka recover too quickly? I could not provider more information about this .😂
I got connection refused, I need to be able to reproduce this exception accurately in order to fix it.
Retrieved the logs and found both errors. the logs like below.
2022/01/06 16:05:12 [error] 32298#32298: *518806498 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: nginx_www, partition_id: 1, lengt
h: 1, context: ngx.timer, client: y, server: x
2022/01/06 16:05:12 [error] 32296#32296: *518806551 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: nginx_www, partition_id: 1, length: 1, context: ngx.timer, client: y, server: x
2022/01/06 16:05:13 [error] 32299#32299: *518806730 [lua] producer.lua:272: buffered messages send to kafka err: not found broker, retryable: true, topic: nginx_www, partition_id: 1, length: 1, context: ngx.timer, client: y server: x
2022/01/06 16:05:13 [error] 32294#32294: *518806744 [lua] producer.lua:272: buffered messages send to kafka err: not found broker, retryable: true, topic: nginx_www, partition_id: 1, length: 1, context: ngx.timer, client: y, server:x
and the first time for connection refused at '16:04:44', the other at '16:05:13' .
I don't know how to reproduce not found broker 😂
2022/04/11 10:48:19 [error] 46#46: *680816936 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.62, server: 0.0.0.0:9080 2022/04/11 10:48:19 [error] 50#50: *680817019 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.39, server: 0.0.0.0:9080 2022/04/11 10:48:19 [error] 50#50: *680817019 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.39, server: 0.0.0.0:9080 2022/04/11 10:48:19 [error] 50#50: *680817019 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.39, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 44#44: *680817160 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.33, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 44#44: *680817160 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.33, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 44#44: *680817160 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.33, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080 2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080 2022/04/11 10:48:21 [error] 46#46: *680817452 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.62, server: 0.0.0.0:9080 2022/04/11 10:48:21 [error] 46#
we also suffered such thing [lua] producer.lua:272: buffered messages send to kafka err: not found broker, retryable: true,
@kennethToyuan @jujiale pls provide full reproduction steps, it looks like this is a mandatory bug on your environment.
2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080
Does the topic exist in the kafka cluster?
thans to your reply. I'm sure I add the tpoic and the topic is exist in kafka. what more is I find in https://github.com/doujiang24/lua-resty-kafka/issues?q=buffered+messages+send+to+kafka+err%3A+not+found+broker there is other people who suffered such situation as me. so I guess problem is in lua-resty-kafka.
If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned
@guxingke I was unable to reproduce the scenario with this on the master branch.
Here is my configuration of route:
{ "plugins": { "kafka-logger": { "broker_list": { "127.0.0.1": 9092 }, "kafka_topic": "apisix_traffic_logs", "key": "key1", "timeout": 1, "batch_max_size": 1 } }, "upstream": { "nodes": { "127.0.0.1:1980": 1 }, "type": "roundrobin" }, "uri": "/hello" }Here is my exception log:
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080 2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080 2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080 2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: apisix_traffic_logs, partition_id: 0, length: 1, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080When I restarted the kafka container, it continued to push messages to kafka normally again.
Note: I did not make the kafka volume, and the restarted kafka is a new cluster.
I find when use kafka cluster model(such as 3 nodes). it occurs such problem. but single node seems ok.
2024/06/26 09:43:38 [error] 49#49: *7790154 [lua] client.lua:210: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 172.23.8.161, server: 0.0.0.0:9080
2024/06/26 09:43:39 [error] 49#49: *7790154 [lua] client.lua:210: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 172.23.8.161, server: 0.0.0.0:9080
2024-06-26T17:43:39.074091868+08:00 2024/06/26 09:43:39 [error] 49#49: *7790154 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: apigw.logs.http-access, partition_id: -1, length: 1, context: ngx.timer, client: 172.23.8.161, server: 0.0.0.0:9080
Same issue, at first it works, then after a while (no config changed) and I try to call API again, the above errors are shown and it is unable to write to Kafka.
APISIX running on K8s with 1 APISIX, 3 etcd Kafka is a Single broker testing node and requires SASL_PLAN auth, admin super user is used and the topic exists.
After a while it worked again, but the calls during error are not logged.