aeron icon indicating copy to clipboard operation
aeron copied to clipboard

[Doc] Clarify C Aeron client is not thread safe

Open maxgawason opened this issue 1 year ago • 6 comments

It would also be helpful to note this in the Client-Concurrency-Model with something like:

Within a process the C++ and Java Aeron clients are thread safe by default and can be shared between threads to manage the life-cycles of publications and subscriptions. The C Aeron client is not thread safe. It is good practice to have one Aeron client per process but this is not a hard requirement. It is possible to use the NoOpLock in the client when used with the Invoker for greater efficiency when used from only one thread.

maxgawason avatar Sep 22 '23 19:09 maxgawason

What part of aeron_t functions is not thread-safe? It does all of its operations asynchronously with the client conductor.

tmontgomery avatar Sep 29 '23 19:09 tmontgomery

I have started to believe aeron_t is not thread-safe because of some unexpected behavior I have seen in testing. The behavior arises when there are two threads that subscribe to the same publication with the same client and proceed to poll the client at different rates.

I just committed an example by modifying BasicPublisher.c and BasicSubscriber.c (thanks to @barko for the subscriber code). If I run BasicPublisher and BasicSubscriber then start to repeatedly bounce BasicPublisher, I eventually start to see old repeated messages. Here is an example output of BasicSubscriber where each empty line represents the BasicPublisher being bounced. After the 7th bounce, the threads see the first few messages (1-4) replayed. However, it doesn't seem to be gap filling because all of the messages aren't replayed. After the 8th bounce, the threads see another random few messages replayed (5-8).

recvd[1] 1
recvd[2] 1
recvd[1] 2
recvd[2] 2
recvd[1] 3
recvd[2] 3
recvd[1] 4
recvd[2] 4

recvd[2] 5
recvd[1] 5
recvd[1] 6
recvd[2] 6
recvd[2] 7
recvd[1] 7

recvd[2] 8
recvd[1] 8
recvd[1] 9
recvd[2] 9
recvd[2] 10
recvd[1] 10
recvd[1] 11
recvd[2] 11

recvd[1] 12
recvd[2] 12
recvd[1] 13
recvd[2] 13
recvd[1] 14
recvd[2] 14
recvd[1] 15
recvd[2] 15

recvd[1] 16
recvd[2] 16
recvd[1] 17
recvd[2] 17
recvd[1] 18
recvd[2] 18
recvd[1] 19
recvd[2] 19

recvd[1] 20
recvd[2] 20
recvd[1] 21
recvd[2] 21
recvd[1] 22
recvd[2] 22

recvd[1] 23
recvd[2] 23
recvd[1] 24
recvd[2] 24
recvd[1] 25
recvd[2] 25
recvd[1] 26
recvd[2] 26

recvd[1] 1
recvd[2] 1
recvd[1] 2
recvd[1] 3
recvd[1] 4
recvd[2] 2
recvd[2] 3
recvd[2] 27
recvd[2] 4
recvd[2] 28
recvd[2] 29
recvd[2] 30
recvd[1] 31
recvd[2] 31
recvd[1] 32
recvd[2] 32
recvd[1] 33
recvd[2] 33
recvd[1] 34
recvd[2] 34
recvd[1] 35
recvd[2] 35
recvd[1] 36
recvd[2] 36

recvd[1] 5
recvd[1] 5
recvd[1] 6
recvd[1] 7
recvd[2] 5
recvd[2] 6
recvd[2] 37
recvd[2] 7
recvd[2] 38
recvd[2] 39
recvd[1] 40
recvd[2] 40
recvd[1] 41
recvd[2] 41
recvd[1] 42
recvd[2] 42
recvd[1] 43
recvd[2] 43
recvd[1] 44
recvd[2] 44
recvd[1] 45
recvd[2] 45
recvd[1] 46
recvd[2] 46

maxgawason avatar Oct 12 '23 17:10 maxgawason

I don't think this is an MT-safe issue. Each time a new publication is started, it is a new image. You can see this if you add the callbacks for available images and unavailable images and by looking at the header on the poll handler and looking at the session id. Aeron doesn't keep ordering across images as they are new streams. My guess is that there is an older image that is seen again on one of the iterations. Adding the info above would show that.

tmontgomery avatar Oct 12 '23 18:10 tmontgomery

Also, we've identified #1512 that an image for 2 subscriptions (which is this case) is not being handled correctly. Not sure if it would show up like this, but possible. Worth trying that and see if things behave a little better.

tmontgomery avatar Oct 12 '23 18:10 tmontgomery

I can reproduce using #1512.

maxgawason avatar Oct 12 '23 18:10 maxgawason

I've checked out the code locally and run it and I can't reproduce the behaviour that you are seeing (with the latest from master). Can you retest and if it is still and issue provide a bit more detail on how you are stopping and start the publisher.

mikeb01 avatar May 26 '24 23:05 mikeb01

Closing due to inactivity.

JPWatson avatar Aug 02 '24 10:08 JPWatson