openthread
openthread copied to clipboard
[message-pool] evict message from coap pending request queue if send queue is empty
If the send queue is empty, currently the messages are not evicted and this could cause received messages to be dropped when too many pending address query notifications. The changes made are to evict the message from head of the coap pending request queue if send queue is empty and new buffer cannot be allocated
Size Report of OpenThread
Merging #9984 into main(0e36799110e0855694751995a9759182aef31766).
name | branch | text | data | bss | total |
---|---|---|---|---|---|
ot-cli-ftd | main | 467112 | 760 | 66364 | 534236 |
#9984 | 467240 | 760 | 66364 | 534364 | |
+/- | +128 | 0 | 0 | +128 | |
ot-ncp-ftd | main | 436380 | 760 | 61576 | 498716 |
#9984 | 436516 | 760 | 61576 | 498852 | |
+/- | +136 | 0 | 0 | +136 | |
libopenthread-ftd.a | main | 236152 | 0 | 40310 | 276462 |
#9984 | 236270 | 0 | 40310 | 276580 | |
+/- | +118 | 0 | 0 | +118 | |
libopenthread-cli-ftd.a | main | 57406 | 0 | 8075 | 65481 |
#9984 | 57406 | 0 | 8075 | 65481 | |
+/- | 0 | 0 | 0 | 0 | |
libopenthread-ncp-ftd.a | main | 31857 | 0 | 5916 | 37773 |
#9984 | 31857 | 0 | 5916 | 37773 | |
+/- | 0 | 0 | 0 | 0 | |
ot-cli-mtd | main | 364888 | 760 | 51220 | 416868 |
#9984 | 365008 | 760 | 51220 | 416988 | |
+/- | +120 | 0 | 0 | +120 | |
ot-ncp-mtd | main | 347420 | 760 | 46448 | 394628 |
#9984 | 347564 | 760 | 46448 | 394772 | |
+/- | +144 | 0 | 0 | +144 | |
libopenthread-mtd.a | main | 158411 | 0 | 25182 | 183593 |
#9984 | 158529 | 0 | 25182 | 183711 | |
+/- | +118 | 0 | 0 | +118 | |
libopenthread-cli-mtd.a | main | 39787 | 0 | 8059 | 47846 |
#9984 | 39787 | 0 | 8059 | 47846 | |
+/- | 0 | 0 | 0 | 0 | |
libopenthread-ncp-mtd.a | main | 24737 | 0 | 5916 | 30653 |
#9984 | 24737 | 0 | 5916 | 30653 | |
+/- | 0 | 0 | 0 | 0 | |
ot-cli-ftd-br | main | 536064 | 768 | 131076 | 667908 |
#9984 | 536208 | 768 | 131076 | 668052 | |
+/- | +144 | 0 | 0 | +144 | |
libopenthread-ftd-br.a | main | 299705 | 5 | 104998 | 404708 |
#9984 | 299823 | 5 | 104998 | 404826 | |
+/- | +118 | 0 | 0 | +118 | |
libopenthread-cli-ftd-br.a | main | 71069 | 0 | 8099 | 79168 |
#9984 | 71069 | 0 | 8099 | 79168 | |
+/- | 0 | 0 | 0 | 0 | |
ot-rcp | main | 62248 | 564 | 20604 | 83416 |
#9984 | 62248 | 564 | 20604 | 83416 | |
+/- | 0 | 0 | 0 | 0 | |
libopenthread-rcp.a | main | 9542 | 0 | 5052 | 14594 |
#9984 | 9542 | 0 | 5052 | 14594 | |
+/- | 0 | 0 | 0 | 0 | |
libopenthread-radio.a | main | 18821 | 0 | 214 | 19035 |
#9984 | 18821 | 0 | 214 | 19035 | |
+/- | 0 | 0 | 0 | 0 |
Thinking more this PR and #9949, some aspects to consider:
TMF Eviction Impact: This PR allows any message (regardless of priority) to possibly evict a TMF agent request. Since TMF messages are generally high priority themselves, should we add checks for such evictions (e.g., based on priority)?
Investigating Address Query/Notification Overload: From #9949, this change aims to handle situations where a device receives many address queries, leading to frequent TMF address notifications. It may help to delve deeper into this and understand it better (address the root cause):
- Query Origin: Are address queries coming from a single device/BR or multiple sources?
- Rate Limiting: If queries originate from the same device, are existing query rate-limiting mechanisms functioning correctly? Are the limits appropriate?
- CoAP Acknowledgement: TMF address notification messages are high-priority, confirmable CoAP messages. Upon receiving a CoAP acknowledgement, they should be swiftly removed from the request queue. Why is ack message being delayed, leading to request pile-up? (particularly since device seem to keep receiving query message from the same sender?)
Thinking more this PR and #9949, some aspects to consider:
TMF Eviction Impact: This PR allows any message (regardless of priority) to possibly evict a TMF agent request. Since TMF messages are generally high priority themselves, should we add checks for such evictions (e.g., based on priority)?
thanks for your inputs@abtink!
The eviction ordering is to first evict low priority messages in send queue, high priority messages in send queue and only if that is empty to evict TMF agent request right? Also we evict from the head of the request queue, which means TMF agent messages are consuming lot of messages and one at the head is already in the buffers for a long time (In the log, I see around 26s to go through all the retransmission attempts)?
Please let me know if anything else can be prioritized ahead of TMF agent messages? The other thought is MLE queue or MPL queues. The reassembly queues anyway have a limit of 2s after which it would be discarded?
Investigating Address Query/Notification Overload: From #9949, this change aims to handle situations where a device receives many address queries, leading to frequent TMF address notifications. It may help to delve deeper into this and understand it better (address the root cause):
- Query Origin: Are address queries coming from a single device/BR or multiple sources?
- Rate Limiting: If queries originate from the same device, are existing query rate-limiting mechanisms functioning correctly? Are the limits appropriate?
- CoAP Acknowledgement: TMF address notification messages are high-priority, confirmable CoAP messages. Upon receiving a CoAP acknowledgement, they should be swiftly removed from the request queue. Why is ack message being delayed, leading to request pile-up? (particularly since device seem to keep receiving query message from the same sender?)
Yes, I agree, from the given logs, it is not clear why we see so frequent address queries from the accessory (with no visibility here), but I suspect there is some underlying matter stack related issue, causing multiple accessories to go into a crash-loop (running out of buffers) and hence do not handle the AddressNotify.
The changes here would still help to mitigate the Border Router being able to receive from other accessories (protecting its RX buffers) in case we encounter scenario of misbehaving accessories?
Thanks @sarveshkumarv3 for submitting the PR and all the updates LGTM.
Couple of smaller suggestions below.
On this:
Also we evict from the head of the request queue, which means TMF agent messages are consuming lot of messages and one at the head is already in the buffers for a long time (In the log, I see around 26s to go through all the retransmission attempts)?
I cannot think of any good way to use the priority when deciding to evict from TMF agent queue. So the approach in the PR sounds reaonable.
thanks for the inputs @abtink, incorporated the changes now
Codecov Report
Attention: Patch coverage is 86.66667%
with 2 lines
in your changes are missing coverage. Please review.
Project coverage is 82.32%. Comparing base (
1fb1d9a
) to head (5710ebb
). Report is 5 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #9984 +/- ##
===========================================
+ Coverage 68.31% 82.32% +14.01%
===========================================
Files 494 549 +55
Lines 57933 68748 +10815
===========================================
+ Hits 39577 56600 +17023
+ Misses 18356 12148 -6208
Files | Coverage Δ | |
---|---|---|
src/core/coap/coap.hpp | 84.61% <ø> (+60.61%) |
:arrow_up: |
src/core/common/message.cpp | 95.88% <100.00%> (+1.30%) |
:arrow_up: |
src/core/coap/coap.cpp | 81.22% <75.00%> (+35.15%) |
:arrow_up: |