vsomeip
vsomeip copied to clipboard
[BUG]: Missing TP segments corrupt all following TP messages
vSomeip Version
v3.4.10
Boost Version
any
Environment
All
Describe the bug
When we tested communication with our sensors, we faced problems with bad E2E CRC checks, occasionally. When the problem occurs, it remains until communication is reset. The messages are notifications, segmented using TP and protected with E2E.
My understanding of the problem is the following: If a single TP segment gets lost, the vsomeip tp-reassembler cannot finish this message. So far so good.
However, now next message is received, segment by segment. The old message is still there waiting to be completed. So for the first few segments we might get a duplicate segement error. As soon as the missing segment from the old message is received, the message is regarded as complete and returned. Then the E2E check is being processed. As we have reassembled the message from segments from actually two consecutive messages the CRC check fails and we have garbage data.
From now on, all messages will be reassembled from mixed segments without a duplicate segment error on the log. Hence the CRC will fail for all messages and the data is actually garbage.
e.g.:
- message consists of 6 segments (0...5)
- receive segments 0,2,3,4,5 and loose segment 1 from the first message
- receive segment 0 of second message -> duplicate segment error
- receive segment 1 of second message -> message is complete and returned --> CRC error
- segment 2,3,4,5 added to new tp message
- receive segment 0 an 1 from third message -> added to previous message, complete and return --> CRC error
- ...
Reproduction Steps
It's hard to reproduce. Somehow remove one TP segment from the communication.
Expected behaviour
In my opinion a missing segment should not invalidate all upcoming traffic.
The problem could be resolved by various actions:
- Lower the message reassembling timeout to less than the message frequency. So incomplete messages will be deleted before the next message arrives. The timeout is 5 seconds hardcoded at the moment and hence not very helpful.
- Force start of a new TP message by segments with offset zero. Remaining incomplete messages will be discarded. Other segments cannot start a new tp message.
I think the first solution is the better one, as it does not introduce as many implications on the order of the segments arriving.
Logs and Screenshots
No response
After revising the SOME/IP TP Spec, I changed my mind: The spec is quite particular about when message reassembly should be interrupted and a new message should start. In other word vsomeip does not obey the specs in this regard.
https://www.autosar.org/fileadmin/standards/R20-11/CP/AUTOSAR_SWS_SOMEIPTransportProtocol.pdf
- In section 7.3.1 it says that a message with offset 0 shall start a new disassembly session.
- In section 7.3.3 it clearly makes sure that the segments have to be received in order.
I will try if I can fix the problems and file a pull request.
hi @siggie0815 could you try and test with this PR: https://github.com/COVESA/vsomeip/pull/783 and see if this fixes the issue. thanks!