Without keepalive set, sockets in ESTABLISHED state can hang forever
In ssh_libssh2.c, there are three do/while loops that retry forever on LIBSSH2_ERROR_TIMEOUT. Two of thoseare in _git_ssh_session_create. This is problematic when the libssh2 timeout is due to the TCP socket being half-open in the ESTABLISHED state. That socket was dropped by the remote end, but since we're waiting on a read, select() in libssh2 will wait until it times out. Then, libgit2 will loop again, which will cause another select() on the same dead socket.
This can be reproduced pretty easily on a macOS laptop by closing the laptop during a libgit2 operation. After a while, reopening the laptop will cause libgit2 to loop forever retrying read timeouts.
I confirmed with lldb on a hanging process that in this case:
- the timeouts set by the user are used and libssh2 is indeed not hanging;
- libgit2 retries forever with no way for the user to configure different behavior.
I have two suggestions here:
- sockets created by libgit2 should set
SO_KEEPALIVE. - libgit2 should actually propagate the timeouts up to the end user instead of retrying forever. You already have GIT_OPT_SET_SERVER_TIMEOUT and GIT_OPT_SET_SERVER_CONNECT_TIMEOUT that suggest that this will happen.
I'm happy to make those changes.
WDYT, @ethomson?