erpc icon indicating copy to clipboard operation
erpc copied to clipboard

[BUG] Race condition two threads getting expected reply error

Open amgross opened this issue 1 year ago • 4 comments

Describe the bug

In short: in case of two client threads requesting in same time from server, they may get each one the answer of the else and hence return kErpcStatus_ExpectedReply due to wrong sequence number

Deep dive to the scenario: In case of simple client is used (not arbitrated), the function performClientRequest sends and receives without locking the mutex (but the receive itself and the send itself runs locked in the framed transport class).

That may lead to scenario of the following steps:

  1. thread A sends request
  2. context switch happens and thread B running and send request (may likely happen if B has higher priority and it tried to send request when A was in middle, and was blocked on the send lock till A will finish his send)
  3. thread B enters receive and blocks/busy waiting to response
  4. server get request A (as it was sent first) and responds to it
  5. thread B get the respond to A and return kErpcStatus_ExpectedReply due to wrong sequence number and releasing the lock
  6. Thread A get into recieve
  7. server get request B (as it was sent first) and responds to it
  8. thread A get the respond to B and return kErpcStatus_ExpectedReply due to wrong sequence number

It should be noted this is probably won't happen in arbitrated client where all client are assigning there sequence number and the arbitrator waking up the relevant thread according the sequence number.

To Reproduce

run two client threads with different priorities on long send/receive loops

Expected behavior

Each thread getting its response

Screenshots

Not applicable

Desktop (please complete the following information)

  • OS: linux
  • eRPC Version: 1.10.0

Steps you didn't forgot to do

  • [x] I checked if there is no related issue opened/closed.
  • [x] I checked that there doesn't exist opened PR which is solving this issue.

Additional context

amgross avatar Nov 17 '24 09:11 amgross

I think it related #374

amgross avatar Nov 17 '24 13:11 amgross

I have a related issue I haven't filed yet. It may be related to this issue though and I haven't fully qualified what is going on. But I had the question, is it supported behavior to call client methods from different threads or should they be synchronized? I know the ArtbitratedClientManager solves called between the server thread and the client thread.. but what about multiple client threads? Is that a supported use case?

djmuhlestein avatar Apr 01 '25 22:04 djmuhlestein

If you are using arbitrated client manager I think you are OK with multiple threads calling in parallel (there is theoretical race condition in case of getting timeout in same time of getting response)

amgross avatar Apr 02 '25 08:04 amgross

If you are using arbitrated client manager I think you are OK with multiple threads calling in parallel (there is theoretical race condition in case of getting timeout in same time of getting response)

Thanks I'll continue to try to debug my problem and see if I can create a reproducing test case.

djmuhlestein avatar Apr 02 '25 14:04 djmuhlestein