increase spinquic watchdog timeout
Description
As discussed in issue #5491 , from logs, the watchdog assert is firing. For now, let's increase it by 100%.
Testing
CI
Documentation
N/A
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 85.64%. Comparing base (4e84609) to head (18ddaa0).
:warning: Report is 1 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #5647 +/- ##
==========================================
- Coverage 86.34% 85.64% -0.71%
==========================================
Files 60 60
Lines 18663 18663
==========================================
- Hits 16114 15983 -131
- Misses 2549 2680 +131
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
I am not that familiar with the spin test, but isn't this only going to cause the spintest to run for a longer time? Looking at the sources very fast, the time you change control the time spent spinning, and there is a WATCHDOG_WIGGLE_ROOM that gives a bit of extra time for the watchdog.
I am not that familiar with the spin test, but isn't this only going to cause the spintest to run for a longer time? Looking at the sources very fast, the time you change control the time spent spinning, and there is a WATCHDOG_WIGGLE_ROOM that gives a bit of extra time for the watchdog.
Yes! good catch
Did you investigate, based on the traces, what was pending when the timeout fired? 2 / 3 seconds is already quite a lot. It is possible something was delayed on a slow VM, but it is possible too that a softlock / deadlock was happening in MsQuic.
Did you investigate, based on the traces, what was pending when the timeout fired? 2 / 3 seconds is already quite a lot. It is possible something was delayed on a slow VM, but it is possible too that a softlock / deadlock was happening in MsQuic.
Based on the ETL trace from the link I added in the issue, I couldn't find any deadlocks happening. Although, there are comments in SpinQuic itself that notes certain code paths will lead to deadlocks, but those are all disabled.
Ok. This might help, but I suspect going from 2sec to 3sec won't be a definitive fix. We should make sure dumps are collected so that next time, we can check the state of pending threads.