redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

AddressSanitizer: SEGV on unknown address

Open dotnwat opened this issue 2 years ago • 7 comments

https://buildkite.com/redpanda/redpanda/builds/12919#018225ac-ab26-48df-92dd-dc5ab24fa10f/1500-9014

TRACE 2022-07-22 12:44:06,694 [shard 0] kafka - request_context.h:160 - [172.16.16.27:43258] sending 18:api_versions response {error_code={ error_code: none [0] } api_keys={{api_key=0 min_version=0 max_version=7}, {api_key=1 min_version=4 max_version=11}, {api_key=2 min_version=0 max_version=4}, {api_key=3 min_version=0 max_version=7}, {api_key=9 min_version=1 max_version=7}, {api_key=10 min_version=0 max_version=2}, {api_key=16 min_version=0 max_version=2}, {api_key=18 min_version=0 max_version=3}, {api_key=11 min_version=0 max_version=5}, {api_key=12 min_version=0 max_version=3}, {api_key=13 min_version=0 max_version=3}, {api_key=14 min_version=0 max_version=3}, {api_key=19 min_version=0 max_version=5}, {api_key=8 min_version=1 max_version=7}, {api_key=32 min_version=0 max_version=2}, {api_key=33 min_version=0 max_version=1}, {api_key=20 min_version=0 max_version=3}, {api_key=15 min_version=0 max_version=4}, {api_key=17 min_version=0 max_version=1}, {api_key=36 min_version=0 max_version=1}, {api_key=44 min_version=0 max_version=0}, {api_key=42 min_version=0 max_version=1}, {api_key=29 min_version=0 max_version=1}, {api_key=35 min_version=0 max_version=1}, {api_key=30 min_version=0 max_version=1}, {api_key=31 min_version=0 max_version=1}, {api_key=22 min_version=0 max_version=1}, {api_key=37 min_version=0 max_version=1}, {api_key=23 min_version=0 max_version=3}} throttle_time_ms=0}
TRACE 2022-07-22 12:44:06,697 [shard 0] kafka - requests.cc:97 - [172.16.16.27:43258] processing name:metadata, key:3, version:4 for rdkafka, mem_units: 29580
DEBUG 2022-07-22 12:44:06,697 [shard 0] cluster - health_monitor_backend.cc:430 - requesing cluster state report with filter: {per_node_filter: {include_partitions: false, ntp_filters: {}}, nodes: {}}, force refresh: false
TRACE 2022-07-22 12:44:06,698 [shard 0] kafka - request_context.h:160 - [172.16.16.27:43258] sending 3:metadata response {throttle_time_ms=0 brokers={{node_id=3 host=docker-rp-7 port=9092 rack={nullopt}}, {node_id=2 host=docker-rp-20 port=9092 rack={nullopt}}, {node_id=1 host=docker-rp-10 port=9092 rack={nullopt}}} cluster_id={redpanda.initializing} controller_id=1 topics={} cluster_authorized_operations=-2147483648}
TRACE 2022-07-22 12:44:06,700 [shard 0] kafka - requests.cc:97 - [172.16.16.27:43258] processing name:metadata, key:3, version:4 for rdkafka, mem_units: 29580
DEBUG 2022-07-22 12:44:06,700 [shard 0] cluster - health_monitor_backend.cc:430 - requesing cluster state report with filter: {per_node_filter: {include_partitions: false, ntp_filters: {}}, nodes: {}}, force refresh: false
TRACE 2022-07-22 12:44:06,701 [shard 0] kafka - request_context.h:160 - [172.16.16.27:43258] sending 3:metadata response {throttle_time_ms=0 brokers={{node_id=3 host=docker-rp-7 port=9092 rack={nullopt}}, {node_id=2 host=docker-rp-20 port=9092 rack={nullopt}}, {node_id=1 host=docker-rp-10 port=9092 rack={nullopt}}} cluster_id={redpanda.initializing} controller_id=1 topics={} cluster_authorized_operations=-2147483648}
TRACE 2022-07-22 12:44:06,702 [shard 0] rpc - conn_quota.cc:118 - put(172.16.16.27)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==2960==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f48a8dc018f bp 0x7fff2b519840 sp 0x7fff2b5195a0 T0)
==2960==The signal is caused by a READ memory access.
==2960==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

dotnwat avatar Jul 22 '22 15:07 dotnwat

@abhijat i noticed you reported an address sanitizer issue a couple days ago related to config. https://github.com/redpanda-data/redpanda/issues/5507

This may or may not be related, but immediately after the last log line here (conn:quota.cc:118 put) we access a configuration value if (_cfg.max_connections()) {

dotnwat avatar Jul 22 '22 15:07 dotnwat

This may be related:

TRACE 2022-07-27 07:49:05,273 [shard 0] rpc - conn_quota.cc:140 - get(172.16.16.27)
TRACE 2022-07-27 07:49:05,273 [shard 0] rpc - conn_quota.cc:147 - Global conn limit disabled
TRACE 2022-07-27 07:49:05,273 [shard 0] rpc - server.cc:230 - kafka rpc protocol - Incoming connection from 172.16.16.27:39778 on "dnslistener"
TRACE 2022-07-27 07:49:05,274 [shard 0] kafka - requests.cc:97 - [172.16.16.27:39778] processing name:api_versions, key:18, version:3 for rdkafka, mem_units: 8124
TRACE 2022-07-27 07:49:05,275 [shard 0] kafka - request_context.h:160 - [172.16.16.27:39778] sending 18:api_versions response {error_code={ error_code: none [0] } api_keys={{api_key=0 min_version=0 max_version=7}, {api_key=1 min_version=4 max_version=11}, {api_key=2 min_version=0 max_version=4}, {api_key=3 min_version=0 max_version=7}, {api_key=9 min_version=1 max_version=7}, {api_key=10 min_version=0 max_version=2}, {api_key=16 min_version=0 max_version=2}, {api_key=18 min_version=0 max_version=3}, {api_key=11 min_version=0 max_version=5}, {api_key=12 min_version=0 max_version=3}, {api_key=13 min_version=0 max_version=3}, {api_key=14 min_version=0 max_version=3}, {api_key=19 min_version=0 max_version=5}, {api_key=8 min_version=1 max_version=7}, {api_key=32 min_version=0 max_version=2}, {api_key=33 min_version=0 max_version=1}, {api_key=20 min_version=0 max_version=3}, {api_key=15 min_version=0 max_version=4}, {api_key=17 min_version=0 max_version=1}, {api_key=36 min_version=0 max_version=1}, {api_key=44 min_version=0 max_version=0}, {api_key=42 min_version=0 max_version=1}, {api_key=29 min_version=0 max_version=1}, {api_key=35 min_version=0 max_version=1}, {api_key=30 min_version=0 max_version=1}, {api_key=31 min_version=0 max_version=1}, {api_key=22 min_version=0 max_version=1}, {api_key=37 min_version=0 max_version=1}, {api_key=23 min_version=0 max_version=3}} throttle_time_ms=0}
TRACE 2022-07-27 07:49:05,278 [shard 0] kafka - requests.cc:97 - [172.16.16.27:39778] processing name:metadata, key:3, version:4 for rdkafka, mem_units: 31116
DEBUG 2022-07-27 07:49:05,278 [shard 0] cluster - health_monitor_backend.cc:430 - requesing cluster state report with filter: {per_node_filter: {include_partitions: false, ntp_filters: {}}, nodes: {}}, force refresh: false
TRACE 2022-07-27 07:49:05,279 [shard 0] kafka - request_context.h:160 - [172.16.16.27:39778] sending 3:metadata response {throttle_time_ms=0 brokers={{node_id=5 host=docker-rp-21 port=9092 rack={E}}, {node_id=3 host=docker-rp-17 port=9092 rack={C}}, {node_id=1 host=docker-rp-5 port=9092 rack={A}}, {node_id=4 host=docker-rp-8 port=9092 rack={D}}, {node_id=2 host=docker-rp-6 port=9092 rack={B}}, {node_id=6 host=docker-rp-22 port=9092 rack={F}}} cluster_id={redpanda.f66f4221-6103-4b9b-b93f-d2399cdd732f} controller_id=1 topics={} cluster_authorized_operations=-2147483648}
TRACE 2022-07-27 07:49:05,281 [shard 0] kafka - requests.cc:97 - [172.16.16.27:39778] processing name:metadata, key:3, version:4 for rdkafka, mem_units: 31116
DEBUG 2022-07-27 07:49:05,281 [shard 0] cluster - health_monitor_backend.cc:430 - requesing cluster state report with filter: {per_node_filter: {include_partitions: false, ntp_filters: {}}, nodes: {}}, force refresh: false
TRACE 2022-07-27 07:49:05,282 [shard 0] kafka - request_context.h:160 - [172.16.16.27:39778] sending 3:metadata response {throttle_time_ms=0 brokers={{node_id=5 host=docker-rp-21 port=9092 rack={E}}, {node_id=3 host=docker-rp-17 port=9092 rack={C}}, {node_id=1 host=docker-rp-5 port=9092 rack={A}}, {node_id=4 host=docker-rp-8 port=9092 rack={D}}, {node_id=2 host=docker-rp-6 port=9092 rack={B}}, {node_id=6 host=docker-rp-22 port=9092 rack={F}}} cluster_id={redpanda.f66f4221-6103-4b9b-b93f-d2399cdd732f} controller_id=1 topics={} cluster_authorized_operations=-2147483648}
TRACE 2022-07-27 07:49:05,284 [shard 0] rpc - conn_quota.cc:118 - put(172.16.16.27)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==4025==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f287269218f bp 0x7fffaeb74c00 sp 0x7fffaeb74960 T0)
==4025==The signal is caused by a READ memory access.
==4025==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.
FAIL test: RackAwarePlacementTest.test_replica_placement.rack_layout_str=ABCDEF.num_partitions=400.replication_factor=5.num_topics=2 (1/28 runs)
  failure at 2022-07-27T08:36:30.473Z: NodeCrash([(<ducktape.cluster.cluster.ClusterNode object at 0x7f65fa248fa0>, '==4025==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f287269218f bp 0x7fffaeb74c00 sp 0x7fffaeb74960 T0)\n')])
      in job https://buildkite.com/redpanda/redpanda/builds/13167#01823e5c-0dc6-448f-8382-23ae5c2e2291

mmaslankaprv avatar Jul 28 '22 07:07 mmaslankaprv

Another one like this:

TRACE 2022-07-28 08:08:18,744 [shard 0] rpc - conn_quota.cc:147 - Global conn limit disabled
TRACE 2022-07-28 08:08:18,744 [shard 0] rpc - server.cc:230 - kafka rpc protocol - Incoming connection from 172.16.16.27:49292 on "dnslistener"
TRACE 2022-07-28 08:08:18,746 [shard 0] kafka - requests.cc:97 - [172.16.16.27:49292] processing name:api_versions, key:18, version:3 for rdkafka, mem_units: 8124
TRACE 2022-07-28 08:08:18,746 [shard 0] kafka - request_context.h:160 - [172.16.16.27:49292] sending 18:api_versions response {error_code={ error_code: none [0] } api_keys={{api_key=0 min_version=0 max_version=7}, {api_key=1 min_version=4 max_version=11}, {api_key=2 min_version=0 max_version=4}, {api_key=3 min_version=0 max_version=7}, {api_key=9 min_version=1 max_version=7}, {api_key=10 min_version=0 max_version=2}, {api_key=16 min_version=0 max_version=2}, {api_key=18 min_version=0 max_version=3}, {api_key=11 min_version=0 max_version=5}, {api_key=12 min_version=0 max_version=3}, {api_key=13 min_version=0 max_version=3}, {api_key=14 min_version=0 max_version=3}, {api_key=19 min_version=0 max_version=5}, {api_key=8 min_version=1 max_version=7}, {api_key=32 min_version=0 max_version=2}, {api_key=33 min_version=0 max_version=1}, {api_key=20 min_version=0 max_version=3}, {api_key=15 min_version=0 max_version=4}, {api_key=17 min_version=0 max_version=1}, {api_key=36 min_version=0 max_version=1}, {api_key=44 min_version=0 max_version=0}, {api_key=42 min_version=0 max_version=1}, {api_key=29 min_version=0 max_version=1}, {api_key=35 min_version=0 max_version=1}, {api_key=30 min_version=0 max_version=1}, {api_key=31 min_version=0 max_version=1}, {api_key=22 min_version=0 max_version=1}, {api_key=37 min_version=0 max_version=1}, {api_key=23 min_version=0 max_version=3}} throttle_time_ms=0}
TRACE 2022-07-28 08:08:18,749 [shard 0] kafka - requests.cc:97 - [172.16.16.27:49292] processing name:metadata, key:3, version:4 for rdkafka, mem_units: 30092
DEBUG 2022-07-28 08:08:18,749 [shard 0] cluster - health_monitor_backend.cc:430 - requesing cluster state report with filter: {per_node_filter: {include_partitions: false, ntp_filters: {}}, nodes: {}}, force refresh: false
TRACE 2022-07-28 08:08:18,750 [shard 0] kafka - request_context.h:160 - [172.16.16.27:49292] sending 3:metadata response {throttle_time_ms=0 brokers={{node_id=3 host=docker-rp-23 port=9092 rack={nullopt}}, {node_id=2 host=docker-rp-22 port=9092 rack={nullopt}}, {node_id=4 host=docker-rp-24 port=9092 rack={nullopt}}, {node_id=1 host=docker-rp-21 port=9092 rack={nullopt}}} cluster_id={redpanda.f4c206e1-e25d-4621-888c-8088dd896827} controller_id=1 topics={} cluster_authorized_operations=-2147483648}
TRACE 2022-07-28 08:08:18,752 [shard 0] kafka - requests.cc:97 - [172.16.16.27:49292] processing name:metadata, key:3, version:4 for rdkafka, mem_units: 30092
DEBUG 2022-07-28 08:08:18,753 [shard 0] cluster - health_monitor_backend.cc:430 - requesing cluster state report with filter: {per_node_filter: {include_partitions: false, ntp_filters: {}}, nodes: {}}, force refresh: false
TRACE 2022-07-28 08:08:18,753 [shard 0] kafka - request_context.h:160 - [172.16.16.27:49292] sending 3:metadata response {throttle_time_ms=0 brokers={{node_id=3 host=docker-rp-23 port=9092 rack={nullopt}}, {node_id=2 host=docker-rp-22 port=9092 rack={nullopt}}, {node_id=4 host=docker-rp-24 port=9092 rack={nullopt}}, {node_id=1 host=docker-rp-21 port=9092 rack={nullopt}}} cluster_id={redpanda.f4c206e1-e25d-4621-888c-8088dd896827} controller_id=1 topics={} cluster_authorized_operations=-2147483648}
TRACE 2022-07-28 08:08:18,755 [shard 0] rpc - conn_quota.cc:118 - put(172.16.16.27)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==6426==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7fb6ed3ad18f bp 0x7ffd696b8900 sp 0x7ffd696b8660 T0)
==6426==The signal is caused by a READ memory access.
==6426==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

https://ci-artifacts.dev.vectorized.cloud/redpanda/01824390-cb8d-42dc-85a8-d5109df02d76/vbuild/ducktape/results/2022-07-28--001/report.html

mmaslankaprv avatar Jul 28 '22 09:07 mmaslankaprv

Happened again here

TRACE 2022-08-15 13:29:37,481 [shard 0] kafka - request_context.h:160 - [172.16.16.27:47064] sending 3:metadata response {throttle_time_ms=0 brokers={{node_id=3 host=docker-rp-23 port=9092 rack={nullopt}}, {node_id=1 host=docker-rp-21 port=9092 rack={nullopt}}, {node_id=2 host=docker-rp-22 port=9092 rack={nullopt}}} cluster_id={redpanda.initializing} controller_id=1 topics={} cluster_authorized_operations=-2147483648}
TRACE 2022-08-15 13:29:37,483 [shard 0] rpc - conn_quota.cc:118 - put(172.16.16.27)
TRACE 2022-08-15 13:29:37,483 [shard 0] rpc - conn_quota.cc:130 - leaving put(172.16.16.27)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1303==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f46237ed18f bp 0x7ffe69254d00 sp 0x7ffe69254a60 T0)
==1303==The signal is caused by a READ memory access.
==1303==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

ztlpn avatar Aug 15 '22 16:08 ztlpn

Another one here

ztlpn avatar Aug 16 '22 13:08 ztlpn

An instance of "membership failed to stabilize" where one of the nodes had crashed with this backtrace-less segfault https://buildkite.com/redpanda/redpanda/builds/15463#0183560e-d5c1-4cfe-ab92-9fa0149f700b

TRACE 2022-09-19 15:14:56,846 [shard 0] rpc - conn_quota.cc:118 - put(172.16.16.27)
TRACE 2022-09-19 15:14:56,846 [shard 0] rpc - conn_quota.cc:130 - leaving put(172.16.16.27)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==15776==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f9c9e71b18f bp 0x7ffeee6add40 sp 0x7ffeee6adaa0 T0)
==15776==The signal is caused by a READ memory access.
==15776==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

jcsp avatar Sep 20 '22 08:09 jcsp

We asked devprod team to provide a way to store coredumps on Ci to help us investigate this failure

mmaslankaprv avatar Sep 20 '22 09:09 mmaslankaprv

https://buildkite.com/redpanda/redpanda/builds/15755#01837ad0-b2b6-4a08-a088-b2d257defc38

jcsp avatar Sep 26 '22 19:09 jcsp

https://buildkite.com/redpanda/redpanda/builds/15755#01837ad0-b2b6-4a08-a088-b2d257defc38

this one looks different in that it doesn't seem to have this common log prefix ending with

TRACE 2022-09-19 15:14:56,846 [shard 0] rpc - conn_quota.cc:130 - leaving put(172.16.16.27)

?

dotnwat avatar Sep 26 '22 19:09 dotnwat

Seen again in https://buildkite.com/redpanda/redpanda/builds/16208#0183a9c9-4cb9-4911-a107-536c79f8f3d5

https://ci-artifacts.dev.vectorized.cloud/redpanda/0183a9c9-4cb9-4911-a107-536c79f8f3d5/vbuild/ducktape/results/2022-10-05--001/report.html

TRACE 2022-10-05 21:21:59,819 [shard 0] rpc - conn_quota.cc:118 - put(172.16.16.27)
TRACE 2022-10-05 21:21:59,819 [shard 0] rpc - conn_quota.cc:130 - leaving put(172.16.16.27)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==13054==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f8ae449d18f bp 0x7ffc12ec9540 sp 0x7ffc12ec92a0 T0)
==13054==The signal is caused by a READ memory access.
==13054==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.```

andrwng avatar Oct 07 '22 18:10 andrwng

https://github.com/redpanda-data/redpanda/pull/6769 is the potential fix. Please re-open if you see this again.

dotnwat avatar Nov 02 '22 17:11 dotnwat