git-lfs icon indicating copy to clipboard operation
git-lfs copied to clipboard

Pushing fails periodically with `dial tcp: lookup github.com: no such host` error, disrupts network

Open hoverbird opened this issue 6 years ago • 18 comments

Hello, good people of Git LFS!

My teammate has been experiencing some very strange, and consistently inconsistent, results when trying to push to our Git LFS repo. We've been trying to debug this for weeks and are out of ideas, but I've noticed a few similar issues on here so I figured maybe there's been some understanding gained recently, although I can't tell if these issues were resolved for others.

My colleague is on Windows 10, git-lfs 2.3.4, got version 2.15.0.windows.1

Sometimes, when she pushes (either from the command line or from the latest Github Desktop client), the push begins successfully, then fails while pushing files to LFS, she loses all network connectivity on her home network- Slack immediately goes yellow, websites fail to load, she can't ping google, etc. At first we thought it was just her network being flaky, but she has no problems uploading streaming video for hours, and the pattern has clearly emerged - I've seen her push, then immediately get disconnected from our screen-sharing session. Her network generally recovers in 2-5 minutes, and she is sometimes able to push successfully later in the day. This has been happening for weeks, and the basic stuff we can think to do (refresh DHCP, restart the computer, restart the router, disabling her firewall etc.) hasn't helped.

The errors tend to end with the error dial tcp: lookup lfs.github.com: no such host or dial tcp: lookup github.com: no such host, though this may be a symptom of the network already being taken down rather than the cause. I honestly have no clue how a git push could take down her overall internet connectivity, but it really does appear that this is what's happening. Any pointers on how to debug this further?

Thanks for all your great work on LFS!

hoverbird avatar Feb 27 '18 20:02 hoverbird

Thanks for opening the issue. I observed a similar behaviour on one of our Win10 boxes recently (1 out of 1000s of machines). My assumption was/is that something in Git LFS triggers some bug in the Windows network stack (I am not sure how realistic that actually is).

In order to investigate the problem I did the following:

  1. I captured the local Git operations:
$ GIT_TRACE=1 GIT_CURL_VERBOSE=1 git clone https://gitserver/repo.git > clone.log 2>&1 
  1. I captured all traffic from the Windows machine to my Git server (1.1.1.1):
$ netsh trace start capture=yes IPv4.Address=1.1.1.1

Afterwards I converted the capture to the Wireshark cab format using Microsoft Message Analyzer.

  1. I captured the traffic from the Windows machine (IP 2.2.2.2) to my Git server on the server:
$ sudo tcpdump -s 0 -i eth0 -w server.pcap host 2.2.2.2

The entire problem was solved after a restart of the Win10 box and I couldn't reproduce it since.


@hoverbird : Can you try to capture your traffic in the way described above? I realized that you can not capture the traffic on the Git server as it looks like you are using github.com . Can you also provide your exact Windows version with:

$ cmd.exe /c ver

@dscho @jeffhostetler: Do you think someone at @Microsoft would like to look into this further? If yes, then please connect me. I would also be happy to provide my TCP dumps if it helps with debugging somehow. If no, then I am not mad either. I realize that this is a pretty unspecific bug description.

larsxschneider avatar Feb 28 '18 13:02 larsxschneider

Hi, I am just commenting here to acknowledge this issue. I have not myself been able to reproduce this issue. If this is indeed an issue with Git LFS, then I am more than happy to provide a fix once the root cause is known.

ttaylorr avatar Feb 28 '18 23:02 ttaylorr

I highly doubt that this is a Git LFS problem. Git LFS might trigger the problem but the root cause ought to be somewhere else. I observed the same "immediately loses all network connectivity" as @hoverbird . If no one from @Microsoft jumps in then I think we should close the issue as "cannot fix".

larsxschneider avatar Mar 01 '18 08:03 larsxschneider

I'll see what I can do - the problem is only occurring on the machine of one of our artists who works remotely, so I will need to learn a good way to remote-access her Windows box. Also, as you say we are not able to capture the inbound traffic to the git server, so that will cut down on what we're able to provide. I'll update the thread when the problem reoccurs, along with any info we've been able to gather.

hoverbird avatar Mar 01 '18 19:03 hoverbird

Her version info: Microsoft Windows [Version 10.0.16299.248]

hoverbird avatar Mar 01 '18 19:03 hoverbird

Does you artist use a VPN client?

larsxschneider avatar Mar 01 '18 21:03 larsxschneider

No, no VPN.

hoverbird avatar Mar 04 '18 19:03 hoverbird

I have the same problem

$ git push
Uploading LFS objects:   0% (0/1), 23 MB | 202 KB/s, done
batch response: Post https://github.com/SepidehAbadpour/MSc_Thesis.git/info/lfs/objects/batch: dial tcp: lookup github.com: no such host
error: failed to push some refs to 'https://github.com/SepidehAbadpour/MSc_Thesis.git'

SepidehAbadpour avatar Jul 24 '19 07:07 SepidehAbadpour

Same here!

Any solution?

eliooses avatar Jul 27 '19 03:07 eliooses

Hey,

The message dial tcp: lookup github.com: no such host generally indicates a DNS problem of some sort. This is almost always a problem with your network, ISP, or DNS software, so those are places to investigate.

bk2204 avatar Jul 29 '19 14:07 bk2204

Hey,

The message dial tcp: lookup github.com: no such host generally indicates a DNS problem of some sort. This is almost always a problem with your network, ISP, or DNS software, so those are places to investigate.

Hello,

I've been searching for a solution for this exact issue. The project I'm working on it hosted on GitLab however, so that could be a problem. I noticed that a subdomain is being added to the LFS url. When the initial project was made it added the same subdomain which stopped us from cloning. Found out later that our internal url had changed since and that removing the subdomian fixes the issue, although LFS doesn't seem to use the updated url. Just wanted to ask if it's possible to change the URL that LFS uses?

Thanks

edit: I tried setting the lfs.url but it defaults back to the url with the subdomain

ta-cpc avatar Aug 06 '19 09:08 ta-cpc

Whether your hosting implementation uses a subdomain of course depends on your hosting implementation. The URLs you're seeing with your subdomain are likely sent by the server in the batch request. You can verify this by running your command with GIT_TRACE=1 GIT_TRANSFER_TRACE=1 and looking at the output of the URLs listed in the batch POST.

It is possible to rewrite those by setting lfs.transfer.enablehrefrewrite, which will allow you to use standard url.*.insteadof and url.*.pushinsteadof to rewrite URLs, assuming you're using 2.8.0.

bk2204 avatar Aug 06 '19 14:08 bk2204

Hello, I solved this problem by manually adding the DNS address on "/etc/hosts".

In my case, it was due to DNS setup problem. I am using a VPN with WSL 1. And even though the Windows can recognize the DNS server, the DNS is not added by VPN into the WSL. I am not sure it is general case but I hope it helps.

imcomking avatar Mar 30 '20 10:03 imcomking

Hello, I solved this problem by manually adding the DNS address on "/etc/hosts".

Could you be more specific?

dskvr avatar Feb 02 '21 00:02 dskvr

You probably don't want to hard-code github.com's addresses into /etc/hosts. That's because they can and do change, and you may find that if you do so, your access to GitHub will fail when they do. This is also true for virtually every other website as well; one of the benefits of DNS is that people can move services without anyone noticing.

bk2204 avatar Feb 02 '21 15:02 bk2204

Adding to this topic, because it's still a thing in 2022. I had the same problem with a large project. After reading all comments here, I had an idea of where to look. I run Pi-hole as my local DNS-resolver. My git-server is a local machine. So it should be easy, no big networks in between.

  1. Checked my firewalls, maybe they triggered something. Nothing there
  2. Checked network-connection, basic ip to ip: no drops, always up.
  3. Checked if Pi-hole resolved properly: it did.

And then this message popped up in the logs: 2022-01-25 20:00:59 | RATE_LIMIT | Client 192.168.1.178 has been rate-limited (current config allows up to 1000 queries in 60 seconds)

That explained a lot. I'm running Win10, so I changed my hostfile and added the address and ip of the git-server, to bypass the Pi-hole. That did the trick!

So this problem has to do with resolving hostnames. And when your dns-resolver starts to throttle things, git-lfs fails.

johannesmateboer avatar Jan 25 '22 19:01 johannesmateboer

As I mentioned just above this comment, adding the hostname to your hosts file will likely cause breakage. It would be better to increase the rate limit or run a local DNS caching server.

bk2204 avatar Jan 25 '22 19:01 bk2204

Increasing the limit: not possible for everyone. If you use i.e. Google or OpenDNS, you can't do anything about that. I'm running a locally hosted Git-server, so I can change that. I agree on running a local cache, but that one can also produce the exact same problem you described, because the cache can be outdated if you're unlucky. I added my response so others can check their setup for this problem, and help them forwards.

johannesmateboer avatar Jan 25 '22 20:01 johannesmateboer

Maybe the way I "solved" this in my case may help someone. I was facing the same problem consistently. After changing my connection from WiFi to cable, the issue disappeared and never happened again.

jonasluz avatar Feb 08 '23 21:02 jonasluz

Hey,

The message dial tcp: lookup github.com: no such host generally indicates a DNS problem of some sort. This is almost always a problem with your network, ISP, or DNS software, so those are places to investigate.

Yes Sir! I'm using Mac.

❯ sudo networksetup -setdnsservers Wi-Fi 8.8.8.8

Password: ❯ networksetup -getdnsservers Wi-Fi

8.8.8.8 ❯ git push -u origin main Uploading LFS objects: 100% (1/1), 400 MB | 8.1 MB/s, done.
Enumerating objects: 39, done. Counting objects: 100% (39/39), done. Delta compression using up to 4 threads Compressing objects: 100% (27/27), done. Writing objects: 94% (37/39)

bongaquino avatar Apr 08 '23 07:04 bongaquino