ucx
ucx copied to clipboard
GTEST/UD: Increase UD EP timeout when running under valgrind
What
Fix for RM#3886801
I managed to reproduce this issue on rock machines in 100% of the cases, but only when running this test under high CPU load. This CPU load I generate using dummy 64 processes (yes > /dev/null). I checked ud_ep timeout logic, and it seems to work correctly. So the reasonable fix would be to increase UCX_UD_TIMEOUT (from 30s to 300s) when running under valgrind. With increased timeout the issue is not reproducible anymore, even with artificial CPU load