vsomeip
vsomeip copied to clipboard
Possible race condition with selective events
Considering the following network architecture with 2 different devices connected to the same network, with the following applications, where the first row represents the central routing managers of each device:
Device 1 | Device 2 |
---|---|
vsomeipd | Application 2 |
Application 1 | Application 3 |
Application 1 is offering a selective event and Application 2 and 3 will be subscribing to it, so Application 2 will send 2 SOME/IP-SD Subscribe messages to vsomeipd
.
In their selective options, the first will only contain it's own ID [2], while the second will contain both it's ID and Application 3 ID [2, 3].
When vsomeipd
receives the first SUBSCRIBE message, it will create a new remote_subscription
whose only client will be App 2 with the PENDING
state, lets say it was stored in the 0x1234 memory position, so:
remote_subscription (0x1234) - clients_{2: PENDING}, parent_: nullptr
vsomeipd
will then proceed to send a VSOMEIP_SUBSCRIBE
local message to Application 1, and the PENDING
state will only be updated to ACKED
once it replies with a VSOMEIP_SUBSCRIBE_ACK
message containing client 2.
Now, if the second SUBSCRIBE message arrives and is processed before client 2 remote subscription state is changed to ACKED
, then in routing_manager_impl::on_remote_subscribe()
, in the call to eventgroupinfo::update_remote_subscription()
, as both instances are "equal", the newly created remote_subscription
will inherit the acknowledgement states from the existing subscription (0x1234) , so we end up with the following instances of remote_subscription
:
remote_subscription (0x1234) - clients_{2: PENDING, 3: PENDING}, parent_: nullptr
remote_subscription (0x4321) - clients_{2: PENDING, 3: PENDING}, parent_: 0x4321
Also, since the 0x1234 instance already had client 2, the update_remote_subscription
will only insert client 3 in its _changed
parameter, meaning that later in the send_subscription
call, it will only send a VSOMEIP_SUBSCRIBE
message containing client 3 to Application 1.
https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/routing/src/routing_manager_impl.cpp#L2742-L2757
After that, vsomeipd
will eventually receive the VSOMEIP_SUBSCRIBE_ACK
message with client 2 from Application 1 and will update the 0x1234 remote_subscription
state of client 2 to ACKED
in the routing_manager_impl::on_subscribe_ack()
method, but no SUBSCRIBE_ACK
Service Discovery message will be sent as the service_discovery_impl::update_remote_subscription()
call will only send the ACK if the remote_subscription
is not pending, since client 3 state is still PENDING
in the 0x1234 subscription that is not the case.
When the next VSOMEIP_SUBSCRIBE_ACK
message with client 3 is received, the remote_subscription
0x4321 will be updated, client 3 state will then be ACKED
, which is also propagated to it's parent 0x1234. Since 0x4321 client 2 state is still PENDING
it will not be removed from the subscriptions_
collection of eventgroupinfo
in
https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/routing/src/routing_manager_impl.cpp#L2844-L2862
Consequently, no SUBSCRIBE_ACK
message will be sent either, so we end up with the following state of the eventgroups subscriptions_
collection:
remote_subscription (0x1234) - clients_{2: ACKED, 3: ACKED}, parent_: nullptr
remote_subscription (0x4321) - clients_{2: PENDING, 3: ACKED}, parent_: 0x4321
Once this happens, subsequent SUBSCRIBE
messages sent by Application 2 will always contain the same clients in its selective option [2, 3] and no change will happen, which seems would be causing it to either enter the !_subscription->is_pending()
branch or the else
branch in eventgroupinfo::update_remote_subscription()
, depending on which existing remote_subscription
is iterated first.
https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/routing/src/eventgroupinfo.cpp#L211-L250
In the first case, it will inherit the state of 0x1234 and not be pending anymore, so it should send the SUBSCRIBE_ACK
back to Application 2 and resume normal behavior(?), whereas in the second case it will be assigned set_answers(0)
and not send any SUBSCRIBE_ACK
back as it would enter the do_not_answer
branch of service_discovery::send_subscription_ack()
.
https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/service_discovery/src/service_discovery_impl.cpp#L3557-L3593
If in the send_subscription()
call we would use something like _subscription.get_pending_clients()
instead of using its_added
when receiving new clients for a selective event, this issue would disappear at the expense of some additional local messages.
Did I miss something or is my reasoning correct?
First of all thank you for the thorough analysis! I am currently trying to reconstruct the (possible) issue and found one thing I do not understand regarding the analysis. It is within this part:
"the newly created remote_subscription will inherit the acknowledgement states from the existing subscription (0x1234) , so we end up with the following instances of remote_subscription:
remote_subscription (0x1234) - clients_{2: PENDING, 3: PENDING}, parent_: nullptr remote_subscription (0x4321) - clients_{2: PENDING, 3: PENDING}, parent_: 0x4321"
If I read "eventgroupinfo::update_remote_subscription" correctly, client 2 is not copied to the second remote subscription that contains client 3. Thus we end up with:
remote_subscription (0x1234) - clients_{2: PENDING, 3: PENDING}, parent_: nullptr remote_subscription (0x4321) - clients_{3: PENDING}, parent_: 0x1234"
instead and the answer to the second subscription is sent as soon as the application acks it.
But because the 2nd SUBSCRIBE message that Application 2 sends to vsomeipd
has clients [2, 3] in its selective options, won't the new subscription be:
remote_subscription (0x4321) - clients_{2: PENDING, 3: PENDING}, parent_: nullptr"
before even entering eventgroupinfo::update_remote_subscription
?
Yes, you are right. Probably we should let the remote_subscription objects know their children to keep the update in sync.