libzmq icon indicating copy to clipboard operation
libzmq copied to clipboard

ypipe_conflate_t::reader_awake has data race

Open alokpr opened this issue 2 years ago • 5 comments

Issue description

TSAN reports data race on ypipe_conflate_t::reader_awake.

Environment

  • libzmq version (commit hash if unreleased): 4.3.4
  • OS: Ubuntu 18.04

Minimal test code / Steps to reproduce the issue

zmq::context_t publisher_context(1);
zmq::socket_t publisher(publisher_context, zmq::socket_type::pub);
publisher.set(zmq::sockopt::conflate, 1);
publisher.connect("ipc:///tmp/foo");

zmq::context_t subscriber_context(1);
zmq::socket_t subscriber(subscriber_context, zmq::socket_type::sub);
subscriber.set(zmq::sockopt::conflate, 1);
subscriber.bind("ipc:///tmp/foo");

while (true) {
  publisher.send();
  subscriber.recv();
}

What's the actual result? (include assertion message & call stack if applicable)

WARNING: ThreadSanitizer: data race
  Read of size 1 at 0x7b38000102f0 by thread T23:
    #0 zmq::ypipe_conflate_t<zmq::msg_t>::flush() <null>
    #1 zmq::pipe_t::flush() src/pipe.cpp:282
    #2 zmq::session_base_t::flush() src/session_base.cpp:236
    #3 zmq::stream_engine_base_t::in_event_internal() src/stream_engine_base.cpp:325
    #4 zmq::stream_engine_base_t::in_event() src/stream_engine_base.cpp:243
    #5 zmq::epoll_t::loop() src/epoll.cpp:206
    #6 zmq::worker_poller_base_t::worker_routine(void*) src/poller_base.cpp:146 
    #7 thread_routine src/thread.cpp:257
    #8 <null> <null> (libtsan.so.0+0x2e3bf)

  Previous write of size 1 at 0x7b38000102f0 by thread T21:
    #0 zmq::ypipe_conflate_t<zmq::msg_t>::check_read() <null>
    #1 zmq::ypipe_conflate_t<zmq::msg_t>::read(zmq::msg_t*) <null>
    #2 zmq::pipe_t::read(zmq::msg_t*) src/pipe.cpp:205
    #3 zmq::xpub_t::xread_activated(zmq::pipe_t*) src/xpub.cpp:103
    #4 zmq::socket_base_t::read_activated(zmq::pipe_t*) src/socket_base.cpp:1722
    #5 zmq::pipe_t::process_activate_read() src/pipe.cpp:290
    #6 zmq::object_t::process_command(zmq::command_t const&) src/object.cpp:75
    #7 zmq::socket_base_t::process_commands(int, bool) src/socket_base.cpp:1505
    #8 zmq::socket_base_t::send(zmq::msg_t*, int) src/socket_base.cpp:1244
    #9 s_sendmsg src/zmq.cpp:381
    #10 zmq_send src/zmq.cpp:409
    #11 zmq::detail::socket_base::send(zmq::const_buffer, zmq::send_flags) zmq.hpp:1905

What's the expected result?

No data race.

alokpr avatar Apr 14 '22 20:04 alokpr

It looks like the only possible value for ypipe_conflate_t::reader_awake is false. So I am not sure why this variable is even needed.

ypipe_conflate_t::reader_awake is initialized with false: https://github.com/zeromq/libzmq/blob/v4.3.4/src/ypipe_conflate.hpp#L50

The only other location where the value is set is in check_read(), where it is again set to false: https://github.com/zeromq/libzmq/blob/v4.3.4/src/ypipe_conflate.hpp#L85

alokpr avatar Apr 14 '22 20:04 alokpr

These are custom data structures, these tools do not understand how they work and report false positives

bluca avatar Apr 15 '22 10:04 bluca

I am not very familiar with zmq but from the callstack reported by tsan, the data race does look valid. It also seems like the member variable reader_awake is unnecessary (please see my previous comment)

alokpr avatar Apr 15 '22 14:04 alokpr

Could you describe how you run TSAN to reproduce this please? I suspect there might be some more data races in the pipes as I see one with ROUTER + HANDOVER, the pipe end not waking up on a term_ack message unless I add some printfs.

mrvn avatar Jul 18 '22 21:07 mrvn

We have an internal unit test using zmq that we run in TSAN config. I will see if I can create a standalone repro.

alokpr avatar Jul 20 '22 00:07 alokpr

These are custom data structures, these tools do not understand how they work and report false positives

Don't run TSAN, it just doesn't understand the sharing model used here, these are all false positives.

bluca avatar Oct 24 '23 10:10 bluca