redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

redpanda: hanging on local shutdown with ctrl-c

Open dotnwat opened this issue 4 years ago • 8 comments

was in the middle of compacting a large 4gb segment. no sure that is related but its the only thing that seemed to be going on.

DEBUG 2021-05-20 14:09:08,218 [shard 1] compaction_ctrl - backlog_controller.cc:128 - updating shares 10
DEBUG 2021-05-20 14:09:08,218 [shard 0] compaction_ctrl - backlog_controller.cc:128 - updating shares 10
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C

I waited a couple minutes to see if it would exit but eventually gave up. CPU was running at 100%. seems like it is in compaction process but getting an abort request

#0  details::io_fragment::io_fragment (this=0x6040036a0fd0, buf=...) at ../../../src/v/bytes/details/io_fragment.h:28
#1  0x0000558f0e810f4a in iobuf_copy (in=..., len=11) at ../../../src/v/bytes/iobuf.cc:151
#2  0x0000558f0e7c19cc in iobuf_parser_base::copy (this=0x611000550698, len=11) at ../../../src/v/bytes/iobuf_parser.h:91
#3  0x0000558f0e7ba5be in model::parse_one_record_copy_from_buffer(iobuf_const_parser&)::$_1::operator()(iobuf_const_parser&, long) const (this=0x7f91b4473ba0, parser=..., len=11) at ../../../src/v/model/record_utils.cc:172
#4  0x0000558f0e7badec in model::parse_record_headers<iobuf_const_parser, model::parse_one_record_copy_from_buffer(iobuf_const_parser&)::$_1>(iobuf_const_parser&, model::parse_one_record_copy_from_buffer(iobuf_const_parser&)::$_1) (
    parser=..., parser_data=...) at ../../../src/v/model/record_utils.cc:101
#5  0x0000558f0e7b4eb9 in model::do_parse_one_record_from_buffer<iobuf_const_parser, model::parse_one_record_copy_from_buffer(iobuf_const_parser&)::$_1>(iobuf_const_parser&, int, signed char, model::parse_one_record_copy_from_buffer(iobuf_const_parser&)::$_1) (parser=..., record_size=469, attr=0 '\000', parser_data=...) at ../../../src/v/model/record_utils.cc:132
#6  0x0000558f0e7b4191 in model::parse_one_record_copy_from_buffer (parser=...) at ../../../src/v/model/record_utils.cc:170
#7  0x0000558f0da02e53 in model::for_each_record<storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}>(model::record_batch const&, storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}&&)::{lambda(iobuf_const_parser&, model::record&)#1}::operator()(iobuf_const_parser, model::record&)::{lambda(int)#1}::operator()(int) const (this=0x60800050e740) at ../../../src/v/model/record.h:691
#8  0x0000558f0da02bd7 in seastar::futurize<seastar::future<void> >::invoke<model::for_each_record<storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}>(model::record_batch const&, storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}&&)::{lambda(iobuf_const_parser&, model::record&)#1}::operator()(iobuf_const_parser, model::record&)::{lambda(int)#1}&, int const&>(model::for_each_record<storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}>(model::record_batch const&, storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}&&)::{lambda(iobuf_const_parser&, model::record&)#1}::operator()(iobuf_const_parser, model::record&)::{lambda(int)#1}&, int const&) (func=..., args=@0x7f91b4474b00: 1512) at rp_deps_install/include/seastar/core/future.hh:2135
#9  0x0000558f0da02538 in seastar::futurize_invoke<model::for_each_record<storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}>(model::record_batch const&, storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}&&)::{lambda(iobuf_const_parser&, model::record&)#1}::operator()(iobuf_const_parser, model::record&)::{lambda(int)#1}&, int const&>(model::for_each_record<storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}>(model::record_batch const&, storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}&&)::{lambda(iobuf_const_parser&, model::record&)#1}::operator()(iobuf_const_parser, model::record&)::{lambda(int)#1}&, int const&) (func=..., args=@0x7f91b4474b00: 1512) at rp_deps_install/include/seastar/core/future.hh:2166
#10 0x0000558f0da0396d in seastar::internal::do_for_each_state<boost::iterators::counting_iterator<int, boost::use_default, boost::use_default>, model::for_each_record<storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}>(model::record_batch const&, storage::internal::index_rebuilder_reducer::do_index(model::record_batch&&)::$_1::operator()(model::record_batch&) const::{lambda(model::record&)#1}&&)::{lambda(iobuf_const_parser&, model::record&)#1}::operator()(iobuf_const_parser, model::record&)::{lambda(int)#1}>::run_and_dispose() (this=0x60800050e720) at rp_deps_install/include/seastar/core/loop.hh:395

dotnwat avatar May 20 '21 21:05 dotnwat

@mmaslankaprv do you think this might be part of the compaction backlog controller?

dotnwat avatar May 20 '21 21:05 dotnwat

I've experienced the same without controller mwrged

mmaslankaprv avatar May 20 '21 21:05 mmaslankaprv

this was typically a background future not closed would cause this.

emaxerrno avatar May 20 '21 22:05 emaxerrno

happening more now. this definitely seems to be a new behavior. this time all I had to do was fetch a lot of data.

TRACE 2021-05-20 18:16:17,889 [shard 2] kafka - request_context.h:144 - sending 1:fetch response {throttle_time_ms=0 error_code={ error_code: none [0] } session_id=0 topics={{name={7sense} partitions={{partition_index=0 error_code={ error_code: none [0] } high_watermark=44670001 last_stable_offset=44670001 log_start_offset=0 aborted={{}} preferred_read_replica=-1 records={{size 0}}}}}}}
INFO  2021-05-20 18:16:17,889 [shard 2] kafka - connection_context.cc:267 - Detected error processing request: std::runtime_error (batched_output_stream is already closed. Ignoring: 76 bytes)
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C

dotnwat avatar May 21 '21 01:05 dotnwat

interesting. last time, we found it was seastar::sleep() was not passing in a seastar::sleep_abortable() and wasn't triggering the abort. i wonder if something is not checking the abort.

emaxerrno avatar May 21 '21 02:05 emaxerrno

seems to be chillin in compaction

Thread 2 (Thread 0x7f5c9b7ff700 (LWP 2631856)):
#0  absl::container_internal::raw_hash_set<absl::container_internal::NodeHashMapPolicy<seastar::basic_sstring<unsigned char, unsigned int, 31u, false>, storage::internal::compaction_key_reducer::value_type>, bytes_hasher<unsigned long, &(xxhash_64(unsigned char const*, unsigned long))>, bytes_type_eq, std::__1::allocator<std::__1::pair<seastar::basic_sstring<unsigned char, unsigned int, 31u, false> const, storage::internal::compaction_key_reducer::value_type> > >::iterator::skip_empty_or_deleted (this=<optimized out>) at rp_deps_install/include/absl/container/internal/raw_hash_set.h:688
#1  absl::container_internal::raw_hash_set<absl::container_internal::NodeHashMapPolicy<seastar::basic_sstring<unsigned char, unsigned int, 31u, false>, storage::internal::compaction_key_reducer::value_type>, bytes_hasher<unsigned long, &(xxhash_64(unsigned char const*, unsigned long))>, bytes_type_eq, std::__1::allocator<std::__1::pair<seastar::basic_sstring<unsigned char, unsigned int, 31u, false> const, storage::internal::compaction_key_reducer::value_type> > >::begin (this=0x6010e15c4868) at rp_deps_install/include/absl/container/internal/raw_hash_set.h:930
#2  storage::internal::compaction_key_reducer::operator() (this=0x6010e15c4840, e=...) at ../../../src/v/storage/compaction_reducers.cc:60
#3  0x00005569c9dc4631 in storage::compacted_index_reader::impl::do_consume<storage::internal::compaction_key_reducer>(storage::internal::compaction_key_reducer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::{lambda()#1}::operator()() const (this=0x6010e63ed940) at ../../../src/v/storage/compacted_index_reader.h:93
#4  seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::invoke<storage::compacted_index_reader::impl::do_consume<storage::internal::compaction_key_reducer>(storage::internal::compaction_key_reducer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::{lambda()#1}&>(storage::internal::compaction_key_reducer&&) (func=...) at rp_deps_install/include/seastar/core/future.hh:2135
#5  seastar::futurize_invoke<storage::compacted_index_reader::impl::do_consume<storage::internal::compaction_key_reducer>(storage::internal::compaction_key_reducer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::{lambda()#1}&>(storage::internal::compaction_key_reducer&&) (func=...) at rp_deps_install/include/seastar/core/future.hh:2166
#6  seastar::internal::repeater<storage::compacted_index_reader::impl::do_consume<storage::internal::compaction_key_reducer>(storage::internal::compaction_key_reducer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::{lambda()#1}>::run_and_dispose() (this=0x6010e63ed900) at rp_deps_install/include/seastar/core/loop.hh:71
#7  0x00005569ca1dff35 in seastar::reactor::run_tasks (this=0x6010000a6000, tq=...) at /home/nwatkins/src/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2263
#8  seastar::reactor::run_some_tasks (this=this@entry=0x6010000a6000) at /home/nwatkins/src/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2672
#9  0x00005569ca1e2ba8 in seastar::reactor::run (this=0x6010000a6000) at /home/nwatkins/src/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2831
#10 0x00005569ca23b384 in seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_92::operator()() const (this=<optimized out>) at /home/nwatkins/src/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4022
#11 0x00005569ca177f40 in std::__1::__function::__value_func<void ()>::operator()() const (this=<optimized out>) at /home/nwatkins/src/redpanda/vbuild/llvm/install/bin/../include/c++/v1/functional:1885
#12 std::__1::function<void ()>::operator()() const (this=<optimized out>) at /home/nwatkins/src/redpanda/vbuild/llvm/install/bin/../include/c++/v1/functional:2560
#13 seastar::posix_thread::start_routine (arg=<optimized out>) at /home/nwatkins/src/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/posix.cc:60
#14 0x00007f5c9c56c432 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f5c9c18d913 in clone () from /lib64/libc.so.6

dotnwat avatar May 21 '21 18:05 dotnwat

this is in compaction. after compaction completes redpanda exists nicely.

dotnwat avatar May 21 '21 21:05 dotnwat

if it's there there then prolly related to the recent controller-for-compaction changes.

emaxerrno avatar May 21 '21 22:05 emaxerrno