zenoh
zenoh copied to clipboard
Close the links after closing the Face
This is another attempt to minimize the occurrence of the race condition described in #1886.
When loosing the connection to a peer, if the send buffer happens to fill up before the keep alive timeout, the tx_task can be stuck in the send_batch operation. On my computer, it can be stuck for 15 minutes before returning with EHOSTUNREACH ("No route to host").
Now this is not that bat itself, it's just a dangling task, a TransportLinkUnicastUniversal and a TransportUnicastUniversal. But this make the race condition mentioned in #1886 very likely: if the peer reconnect within those 15 minutes, the re-connection is bogus.
This "fix" make TransportUnicastUniversal::delete() notify the RuntimeSession of the transport deletion before closing the link, instead of after closing the link. It doesn't seem to cause any issue. I didn't do the same in TransportUnicastLowlatency because I am not sure it is necessary.
I am not really happy with this hack, because I don't fully understand the consequences of the fix and it just make the race-condition a lot less likely (15min time span vs a few milliseconds). Have a look at #1886 for more information.
PR missing one of the required labels: {'dependencies', 'breaking-change', 'enhancement', 'internal', 'bug', 'new feature', 'documentation'}
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 70.89%. Comparing base (b33ddc3) to head (a120311).
:warning: Report is 53 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1979 +/- ##
==========================================
- Coverage 71.03% 70.89% -0.15%
==========================================
Files 364 364
Lines 61519 61519
==========================================
- Hits 43699 43612 -87
- Misses 17820 17907 +87
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.