rDSN
rDSN copied to clipboard
bug in asio_rpc_session
@imzhenyu, you may want to run perf-test when the fault injector(for tcp model) is enabled with native run. I encountered core dump several times.
Here is the stack trace:
#0 0x00007fcbe82b1ce9 in boost::asio::detail::epoll_reactor::start_op (this=0x25be520, op_type=1, descriptor=34, descriptor_data=@0x7fcbc81bea78: 0x0, op=0x7fcba1e40340,
is_continuation=false, allow_speculative=true) at /usr/include/boost/asio/detail/impl/epoll_reactor.ipp:219
#1 0x00007fcbe82b3b94 in boost::asio::detail::reactive_socket_service_base::start_op (this=0x25be6d8, impl=..., op_type=1, op=0x7fcba1e40340, is_continuation=false, is_non_blocking=true,
noop=false) at /usr/include/boost/asio/detail/impl/reactive_socket_service_base.ipp:213
#2 0x00007fcbe82c15c2 in boost::asio::detail::reactive_socket_service_base::async_send<boost::asio::detail::consuming_buffers<boost::asio::const_buffer, std::vector<boost::asio::const_buffer> >, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, std::vector<boost::asio::const_buffer>, boost::asio::detail::transfer_all_t, dsn::tools::asio_rpc_session::write(uint64_t)::__lambda2> >(boost::asio::detail::reactive_socket_service_base::base_implementation_type &, const boost::asio::detail::consuming_buffers<boost::asio::const_buffer, std::vector<boost::asio::const_buffer, std::allocator<boost::asio::const_buffer> > > &, boost::asio::socket_base::message_flags, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, std::vector<boost::asio::const_buffer, std::allocator<boost::asio::const_buffer> >, boost::asio::detail::transfer_all_t, dsn::tools::asio_rpc_session::write(uint64_t)::__lambda2>) (this=0x25be6d8, impl=..., buffers=..., flags=0, handler=...)
at /usr/include/boost/asio/detail/reactive_socket_service_base.hpp:215
#3 0x00007fcbe82c1069 in boost::asio::stream_socket_service<boost::asio::ip::tcp>::async_send<boost::asio::detail::consuming_buffers<boost::asio::const_buffer, std::vector<boost::asio::const_buffer> >, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, std::vector<boost::asio::const_buffer>, boost::asio::detail::transfer_all_t, dsn::tools::asio_rpc_session::write(uint64_t)::__lambda2> >(boost::asio::stream_socket_service<boost::asio::ip::tcp>::implementation_type &, const boost::asio::detail::consuming_buffers<boost::asio::const_buffer, std::vector<boost::asio::const_buffer, std::allocator<boost::asio::const_buffer> > > &, boost::asio::socket_base::message_flags, <unknown type in /home/weijiesun/rDSN/builder/lib/libdsn.core.so, CU 0xc90622, DIE 0xcf5222>) (this=0x25be6b0, impl=..., buffers=..., flags=0, handler=<unknown type in /home/weijiesun/rDSN/builder/lib/libdsn.core.so, CU 0xc90622, DIE 0xcf5222>)
at /usr/include/boost/asio/stream_socket_service.hpp:326
#4 0x00007fcbe82c0bf7 in boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >::async_write_some<boost::asio::detail::consuming_buffers<boost::asio::const_buffer, std::vector<boost::asio::const_buffer> >, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, std::vector<boost::asio::const_buffer>, boost::asio::detail::transfer_all_t, dsn::tools::asio_rpc_session::write(uint64_t)::__lambda2> >(const boost::asio::detail::consuming_buffers<boost::asio::const_buffer, std::vector<boost::asio::const_buffer, std::allocator<boost::asio::const_buffer> > > &, <unknown type in /home/weijiesun/rDSN/builder/lib/libdsn.core.so, CU 0xc90622, DIE 0xcf3a25>) (
this=0x7fcbc81bea70, buffers=..., handler=<unknown type in /home/weijiesun/rDSN/builder/lib/libdsn.core.so, CU 0xc90622, DIE 0xcf3a25>)
at /usr/include/boost/asio/basic_stream_socket.hpp:732
#5 0x00007fcbe82c0752 in boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, std::vector<boost::asio::const_buffer, std::allocator<boost::asio::const_buffer> >, boost::asio::detail::transfer_all_t, dsn::tools::asio_rpc_session::write(uint64_t)::__lambda2>::operator()(const boost::system::error_code &, std::size_t, int) (this=0x7fcbd6ff9ff0, ec=..., bytes_transferred=0, start=1) at /usr/include/boost/asio/impl/write.hpp:181
#6 0x00007fcbe82c042b in boost::asio::async_write<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, std::vector<boost::asio::const_buffer>, dsn::tools::asio_rpc_session::write(uint64_t)::__lambda2>(boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > &, const std::vector<boost::asio::const_buffer, std::allocator<boost::asio::const_buffer> > &, <unknown type in /home/weijiesun/rDSN/builder/lib/libdsn.core.so, CU 0xc90622, DIE 0xcf0cfd>) (s=..., buffers=std::vector of length 1, capacity 1 = {...},
handler=<unknown type in /home/weijiesun/rDSN/builder/lib/libdsn.core.so, CU 0xc90622, DIE 0xcf0cfd>) at /usr/include/boost/asio/impl/write.hpp:621
#7 0x00007fcbe82bff05 in dsn::tools::asio_rpc_session::write (this=0x7fcbbc134960, signature=407) at /home/weijiesun/rDSN/src/core/tools/common/asio_rpc_session.cpp:145
#8 0x00007fcbe82c4047 in dsn::tools::asio_rpc_session::send (this=0x7fcbbc134960, signature=407) at /home/weijiesun/rDSN/src/core/tools/common/asio_rpc_session.h:58
#9 0x00007fcbe83428b6 in dsn::rpc_session::on_send_completed (this=0x7fcbbc134960, signature=406) at /home/weijiesun/rDSN/src/core/core/network.cpp:318
#10 0x00007fcbe82bfdc5 in dsn::tools::asio_rpc_session::__lambda2::operator() (__closure=0x7fcbd6ffa3f8, ec=..., length=120)
at /home/weijiesun/rDSN/src/core/tools/common/asio_rpc_session.cpp:141
I suspect this is due to the socket is closed by fault injector but other threads is reading/writng on the socket.
@shengofsun yes, our new network failure model will close the socket when other threads is read/writing on the socket. I assume this will only trigger some error code to the read/write ops, but will not lead to crash. I will look into it once I reproduce this scenario.
Another way is to refine the net failure model by not closing the socket, but using setsockopt to block the incoming/outgoing traffic instead.
I'd prefer closing the socket, because I think it is closer to the real world's tcp failure model. Besides, closing the tcp socket helps to test some corner case of the rpc_session.
Of course. The question is whether closing the socket in asio is a valid operation in terms of thread safety. We want to mimic the real world's failure but we don't want to violate the programming constraints.
@shengofsun I cannot reproduce this on my machine. Is it because the async-read/write ops may throw exceptions when the other threads call socket::close() concurrently, which leads to the coredump as we don't have C++ catch around the two apis there? If that is the case, I think adding C++ exception handling may fix the issue.
OK, let me try to locate the root cause later.