GitPython icon indicating copy to clipboard operation
GitPython copied to clipboard

Occasional "fatal: unable to connect to localhost" on CI

Open EliahKagan opened this issue 2 years ago • 7 comments

From time to time I get an error like this on CI:

FAILED test/test_base.py::TestBase::test_with_rw_remote_and_rw_repo - git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git ls-remote daemon_origin
  stderr: 'fatal: unable to connect to localhost:
localhost[0: ::1]: errno=Connection refused
localhost[1: 127.0.0.1]: errno=Connection refused
'

I'm not sure why this happens, but I see it every few days or so, on days I'm pushing lots of commits (more precisely: on days I run git push many times, since CI is only running on the tip of the pushed branch). It happens both in my fork, in PRs here on this upstream repository (as in #1675 detailed above), and in pushes to this upstream repository (as in d1c1f31). Rerunning the CI check gets rid of it; that is, it doesn't recur. I don't think I've had this locally, but I don't run the tests locally as much as on CI.

Searching reveals that something like this, perhaps the exact thing, happened in #1119, so this may be known. I'm not sure if its known still to be occasionally occurring, or if there is anything that can reasonably be done to make it happen even less often or not at all. (If it's not worth having an issue on this, then I don't mind this simply being closed.)

EliahKagan avatar Sep 21 '23 11:09 EliahKagan

As a brief update that I hope to flesh out more at some point: I suspect this is actually related to the problem that HIDE_WINDOWS_FREEZE_ERRORS is about on native Windows systems. The same tests, or a least one of them, seem affected. I suspect what's happening is that git-daemon is occasionally unresponsive on any platform (though it seems to be a CI issue, as I tried running this test 10,000 times on my Ubuntu system today and everything was fine), but that on some Windows systems the connection does not time out for much longer. This hunch, which could very well be wrong, is based on a wispy recollection of other issues on some Windows systems with network connections to unresponsive servers blocking for an extended time. I don't remember the details. To be clear, this is not something that Windows users would expect to experience regularly; I believe it's something specific that I am just not fully recalling.

EliahKagan avatar Sep 26 '23 08:09 EliahKagan

I don't believe I've observed this problem recently, and this was never marked confirmed. This should probably be closed after some further amount of time if I don't observe the problem again (and if no one else mentions it either). But I am not sure how long is best to wait.

EliahKagan avatar Jan 26 '24 09:01 EliahKagan

I don't believe I've observed this problem recently, and this was never marked confirmed. This should probably be closed after some further amount of time if I don't observe the problem again (and if no one else mentions it either). But I am not sure how long is best to wait.

Never mind, it still happens sometimes.

EliahKagan avatar Jan 27 '24 03:01 EliahKagan

I think the way to proceed with this is to modify the one affected test so that, when it fails in this specific way, it retries several times. Since as noted above this situation may already be a cause of extended blocking on Windows (where the test is disabled by default and not currently run on CI), retrying should probably only be done on non-Windows systems.

This should achieve at least one of two things:

  • If the failures are random, the problem is effectively fixed, because failing on each try will be no more common than other unusual kinds of CI failures that tend not to persist (e.g. failing to check out the repository in the first place).
  • If the failures are not random, such that retrying often also fails, then we have learned something the problem that that may help to figure out the cause.

EliahKagan avatar Aug 16 '24 01:08 EliahKagan

Sounds good to me, thank you!

Byron avatar Aug 16 '24 08:08 Byron