risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

Meta node hang DDL processing when connection setup timeout of SASL connection

Open StrikeW opened this issue 1 year ago • 3 comments

Describe the bug

The Meta node hangs again which blocked all DDLs. And there are many lines of WARN log of connection timeout of librdkafka:

{"timestamp":"2024-02-24T06:09:17.35828774Z","level":"WARN","fields":{"message":"librdkafka: FAIL [thrd:sasl_ssl://b0-xxx.aws.confluent.cloud:9092/boot]: sasl_ssl://b0-xxx.aws.confluent.cloud:9092/0: Connection setup timed out in state CONNECT (after 30034ms in state CONNECT, 1 identical error(s) suppressed)","log.target":"librdkafka","log.module_path":"madsim_rdkafka::std_::client","log.file":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/madsim-rdkafka-0.3.0+0.34.0/src/std/client.rs","log.line":78},"target":"librdkafka"}

Meta node cannot process DDL commands and it seems due to the connection timeout of librdkafka. (output of show processlist) image

Error message/log

"message":"librdkafka: FAIL [thrd:sasl_ssl://b0-xxx.aws.confluent.cloud:9092/boot]: sasl_ssl://b0-xxx.aws.confluent.cloud:9092/0: Connection setup timed out in state CONNECT (after 30032ms in state CONNECT, 1 identical error(s) suppressed

https://grafana.prod.risingwave.cloud/explore?panes=%7B%22rF-%22:%7B%22datasource%22:%22P5EC303186A5DB006%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bapp%3D%5C%22risingwave-meta-default-0%5C%22,%20namespace%3D%5C%22rwc-g1hmdvc3u9f88otor7j1kbpin2-thumbtack-prod-poc%5C%22%7D%20%7C~%20%60%28WARN%7CERROR%29%60%20%7C%20json%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22P5EC303186A5DB006%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221708750800000%22,%22to%22:%221708756259000%22%7D%7D%7D&schemaVersion=1&orgId=1

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

tenant: https://grafana.prod.risingwave.cloud/d/AdminDashboard_Tenant/tenant?var-datasource=PE662C12516FAE815&var-id=3&orgId=1

The version of RisingWave

PostgreSQL 9.5-RisingWave-1.6.1 (02ee186211e44001c645027bf5aca3db5f076d29)

Additional context

No response

StrikeW avatar Feb 24 '24 06:02 StrikeW

cc @tabVersion @yezizp2012

StrikeW avatar Feb 24 '24 06:02 StrikeW

Caused by https://github.com/confluentinc/librdkafka/pull/4460 @wangrunji0408 please help to update madsim-rdkafka and patch to release branch of v1.6 and v1.7, thanks.

StrikeW avatar Feb 28 '24 02:02 StrikeW

#15313 for main #15314 for release-1.7 #15315 for release-1.6

wangrunji0408 avatar Feb 28 '24 03:02 wangrunji0408