vsomeip
vsomeip copied to clipboard
Possible race condition with selective events
Considering the following network architecture with 2 different devices connected to the same network, with the following applications, where the first row represents the central routing managers of each device:
| Device 1 | Device 2 |
|---|---|
| vsomeipd | Application 2 |
| Application 1 | Application 3 |
Application 1 is offering a selective event and Application 2 and 3 will be subscribing to it, so Application 2 will send 2 SOME/IP-SD Subscribe messages to vsomeipd.
In their selective options, the first will only contain it's own ID [2], while the second will contain both it's ID and Application 3 ID [2, 3].
When vsomeipd receives the first SUBSCRIBE message, it will create a new remote_subscription whose only client will be App 2 with the PENDING state, lets say it was stored in the 0x1234 memory position, so:
remote_subscription (0x1234) - clients_{2: PENDING}, parent_: nullptr
vsomeipd will then proceed to send a VSOMEIP_SUBSCRIBE local message to Application 1, and the PENDING state will only be updated to ACKED once it replies with a VSOMEIP_SUBSCRIBE_ACK message containing client 2.
Now, if the second SUBSCRIBE message arrives and is processed before client 2 remote subscription state is changed to ACKED, then in routing_manager_impl::on_remote_subscribe(), in the call to eventgroupinfo::update_remote_subscription(), as both instances are "equal", the newly created remote_subscription will inherit the acknowledgement states from the existing subscription (0x1234) , so we end up with the following instances of remote_subscription:
remote_subscription (0x1234) - clients_{2: PENDING, 3: PENDING}, parent_: nullptr
remote_subscription (0x4321) - clients_{2: PENDING, 3: PENDING}, parent_: 0x4321
Also, since the 0x1234 instance already had client 2, the update_remote_subscription will only insert client 3 in its _changed parameter, meaning that later in the send_subscription call, it will only send a VSOMEIP_SUBSCRIBE message containing client 3 to Application 1.
https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/routing/src/routing_manager_impl.cpp#L2742-L2757
After that, vsomeipd will eventually receive the VSOMEIP_SUBSCRIBE_ACK message with client 2 from Application 1 and will update the 0x1234 remote_subscription state of client 2 to ACKED in the routing_manager_impl::on_subscribe_ack() method, but no SUBSCRIBE_ACK Service Discovery message will be sent as the service_discovery_impl::update_remote_subscription() call will only send the ACK if the remote_subscription is not pending, since client 3 state is still PENDING in the 0x1234 subscription that is not the case.
When the next VSOMEIP_SUBSCRIBE_ACK message with client 3 is received, the remote_subscription 0x4321 will be updated, client 3 state will then be ACKED, which is also propagated to it's parent 0x1234. Since 0x4321 client 2 state is still PENDING it will not be removed from the subscriptions_ collection of eventgroupinfo in
https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/routing/src/routing_manager_impl.cpp#L2844-L2862
Consequently, no SUBSCRIBE_ACK message will be sent either, so we end up with the following state of the eventgroups subscriptions_ collection:
remote_subscription (0x1234) - clients_{2: ACKED, 3: ACKED}, parent_: nullptr
remote_subscription (0x4321) - clients_{2: PENDING, 3: ACKED}, parent_: 0x4321
Once this happens, subsequent SUBSCRIBE messages sent by Application 2 will always contain the same clients in its selective option [2, 3] and no change will happen, which seems would be causing it to either enter the !_subscription->is_pending() branch or the else branch in eventgroupinfo::update_remote_subscription(), depending on which existing remote_subscription is iterated first.
https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/routing/src/eventgroupinfo.cpp#L211-L250
In the first case, it will inherit the state of 0x1234 and not be pending anymore, so it should send the SUBSCRIBE_ACK back to Application 2 and resume normal behavior(?), whereas in the second case it will be assigned set_answers(0) and not send any SUBSCRIBE_ACK back as it would enter the do_not_answer branch of service_discovery::send_subscription_ack().
https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/service_discovery/src/service_discovery_impl.cpp#L3557-L3593
If in the send_subscription() call we would use something like _subscription.get_pending_clients() instead of using its_added when receiving new clients for a selective event, this issue would disappear at the expense of some additional local messages.
Did I miss something or is my reasoning correct?
First of all thank you for the thorough analysis! I am currently trying to reconstruct the (possible) issue and found one thing I do not understand regarding the analysis. It is within this part:
"the newly created remote_subscription will inherit the acknowledgement states from the existing subscription (0x1234) , so we end up with the following instances of remote_subscription:
remote_subscription (0x1234) - clients_{2: PENDING, 3: PENDING}, parent_: nullptr remote_subscription (0x4321) - clients_{2: PENDING, 3: PENDING}, parent_: 0x4321"
If I read "eventgroupinfo::update_remote_subscription" correctly, client 2 is not copied to the second remote subscription that contains client 3. Thus we end up with:
remote_subscription (0x1234) - clients_{2: PENDING, 3: PENDING}, parent_: nullptr remote_subscription (0x4321) - clients_{3: PENDING}, parent_: 0x1234"
instead and the answer to the second subscription is sent as soon as the application acks it.
But because the 2nd SUBSCRIBE message that Application 2 sends to vsomeipd has clients [2, 3] in its selective options, won't the new subscription be:
remote_subscription (0x4321) - clients_{2: PENDING, 3: PENDING}, parent_: nullptr"
before even entering eventgroupinfo::update_remote_subscription?
Yes, you are right. Probably we should let the remote_subscription objects know their children to keep the update in sync.