aeron
aeron copied to clipboard
[Doc] Clarify C Aeron client is not thread safe
It would also be helpful to note this in the Client-Concurrency-Model with something like:
Within a process the C++ and Java Aeron clients are thread safe by default and can be shared between threads to manage the life-cycles of publications and subscriptions. The C Aeron client is not thread safe. It is good practice to have one Aeron client per process but this is not a hard requirement. It is possible to use the NoOpLock in the client when used with the Invoker for greater efficiency when used from only one thread.
What part of aeron_t
functions is not thread-safe? It does all of its operations asynchronously with the client conductor.
I have started to believe aeron_t
is not thread-safe because of some unexpected behavior I have seen in testing. The behavior arises when there are two threads that subscribe to the same publication with the same client and proceed to poll the client at different rates.
I just committed an example by modifying BasicPublisher.c
and BasicSubscriber.c
(thanks to @barko for the subscriber code). If I run BasicPublisher
and BasicSubscriber
then start to repeatedly bounce BasicPublisher
, I eventually start to see old repeated messages. Here is an example output of BasicSubscriber
where each empty line represents the BasicPublisher
being bounced. After the 7th bounce, the threads see the first few messages (1-4) replayed. However, it doesn't seem to be gap filling because all of the messages aren't replayed. After the 8th bounce, the threads see another random few messages replayed (5-8).
recvd[1] 1
recvd[2] 1
recvd[1] 2
recvd[2] 2
recvd[1] 3
recvd[2] 3
recvd[1] 4
recvd[2] 4
recvd[2] 5
recvd[1] 5
recvd[1] 6
recvd[2] 6
recvd[2] 7
recvd[1] 7
recvd[2] 8
recvd[1] 8
recvd[1] 9
recvd[2] 9
recvd[2] 10
recvd[1] 10
recvd[1] 11
recvd[2] 11
recvd[1] 12
recvd[2] 12
recvd[1] 13
recvd[2] 13
recvd[1] 14
recvd[2] 14
recvd[1] 15
recvd[2] 15
recvd[1] 16
recvd[2] 16
recvd[1] 17
recvd[2] 17
recvd[1] 18
recvd[2] 18
recvd[1] 19
recvd[2] 19
recvd[1] 20
recvd[2] 20
recvd[1] 21
recvd[2] 21
recvd[1] 22
recvd[2] 22
recvd[1] 23
recvd[2] 23
recvd[1] 24
recvd[2] 24
recvd[1] 25
recvd[2] 25
recvd[1] 26
recvd[2] 26
recvd[1] 1
recvd[2] 1
recvd[1] 2
recvd[1] 3
recvd[1] 4
recvd[2] 2
recvd[2] 3
recvd[2] 27
recvd[2] 4
recvd[2] 28
recvd[2] 29
recvd[2] 30
recvd[1] 31
recvd[2] 31
recvd[1] 32
recvd[2] 32
recvd[1] 33
recvd[2] 33
recvd[1] 34
recvd[2] 34
recvd[1] 35
recvd[2] 35
recvd[1] 36
recvd[2] 36
recvd[1] 5
recvd[1] 5
recvd[1] 6
recvd[1] 7
recvd[2] 5
recvd[2] 6
recvd[2] 37
recvd[2] 7
recvd[2] 38
recvd[2] 39
recvd[1] 40
recvd[2] 40
recvd[1] 41
recvd[2] 41
recvd[1] 42
recvd[2] 42
recvd[1] 43
recvd[2] 43
recvd[1] 44
recvd[2] 44
recvd[1] 45
recvd[2] 45
recvd[1] 46
recvd[2] 46
I don't think this is an MT-safe issue. Each time a new publication is started, it is a new image. You can see this if you add the callbacks for available images and unavailable images and by looking at the header on the poll handler and looking at the session id. Aeron doesn't keep ordering across images as they are new streams. My guess is that there is an older image that is seen again on one of the iterations. Adding the info above would show that.
Also, we've identified #1512 that an image for 2 subscriptions (which is this case) is not being handled correctly. Not sure if it would show up like this, but possible. Worth trying that and see if things behave a little better.
I can reproduce using #1512.
I've checked out the code locally and run it and I can't reproduce the behaviour that you are seeing (with the latest from master). Can you retest and if it is still and issue provide a bit more detail on how you are stopping and start the publisher.
Closing due to inactivity.