Dragonfly icon indicating copy to clipboard operation
Dragonfly copied to clipboard

failed to get file length from http client i/o timeout

Open zhujian7 opened this issue 5 years ago • 9 comments

Ⅰ. Issue Description

supernode failed to get file length from http client:

2019-10-29 03:43:13.453 INFO sign:8 : success to init local ip of supernode, use ip:
2019-10-29 03:43:13.453 INFO sign:8 : start to run supernode
2019-10-29 03:43:40.414 INFO sign:8 : success to register peer &{IP: HostName:kube-master-3 Port:0 Version:0.4.3}
2019-10-29 03:43:40.616 INFO sign:8 : success to register peer &{IP: HostName:kube-master-3 Port:0 Version:0.4.3}
2019-10-29 03:43:40.810 INFO sign:8 : success to register peer &{IP: HostName:kube-master-3 Port:0 Version:0.4.3}
2019-10-29 03:43:41.110 INFO sign:8 : success to register peer &{IP: HostName:kube-master-3 Port:0 Version:0.4.3}
2019-10-29 03:43:43.416 ERRO sign:8 : failed to get file length from http client for taskID(fc4ace4d1e109d742a7c3de06d5c0dd768a885022fc23fac095c742cf239e457): failed to get http file Length: Get https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f: dial tcp: i/o timeout: {"Code":10,"Msg":"unknow error"}
2019-10-29 03:43:43.416 INFO sign:8 : failed to add or update task with req &{CID: CallSystem: Dfdaemon:true Filter:[] Headers:map[Authorization:Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6IlY0TEg6TkpIVDpSQ1FMOkRUUUg6VVBJQjpOREFJOk5EVlg6VkZNMjo2NURQOlQ1Q086TjZBRTpHTUhPIn0.eyJpc3MiOiJoYXJib3ItdG9rZW4taXNzdWVyIiwic3ViIjoiIiwiYXVkIjoiaGFyYm9yLXJlZ2lzdHJ5IiwiZXhwIjoxNTcyMzIyNDE3LCJuYmYiOjE1NzIzMjA2MTcsImlhdCI6MTU3MjMyMDYxNywianRpIjoicTdyWjROcXJRRVdKRmJNQSIsImFjY2VzcyI6W3sidHlwZSI6InJlcG9zaXRvcnkiLCJuYW1lIjoibGlicmFyeS9uZ2lueCIsImFjdGlvbnMiOlsicHVsbCJdfV19.s5VJJOQvWcVFAt-l3n9PV3SJWZT7hnd014a-8XJJrHRPAULWHhUZiOmdU1XojvUxQx1chuOktXi1M3t81y8-QqoYpgfBHjQn-n7hVp4--v8wiSfxvzVa30sqv42bIEaZ8iZPQMEfuY0m6F4u-1hcIuov6I5CyJCJOsx231LL_aZu97Bd5fHGYx2qJJzCjQ7dtJ7wXIIZgV5Mjp6lomVjIl086rldecCL7OXCsFt_jh3D4LfezTf9GJLneieKKZqxa0CAhwSDQOIyPErjaHhLlJrFGCaCOxCwj20QQD7ZAx69ah8wodgjdnzHwnaWbeQC4B4Sukbc-sfICFrAK3JCd4VoIrwvx4QHibcbT6ZUF8N-FCKgvujWa07KXF96ASYKLCilNWiKQMtffTW2URaPcEccOrYRIMbQIpWa4OXIX-nIHvAnAYNuHQf3ywxS-nwRfjlIVhL1p5I86xFTDYts_k0Mt4G8nbna-dGB1-dSmH9C7hJL1JKltIYII7JL8kJdMYCnKiRiGdqTNEu7V7gcfLX7Y1LpwjH6uFyDP-Sso8Uxwlwmh-5NEqDatR_n18ut6d44fTQZ0nkODdBsOVGkMpFqN1e3SoOUv1hClmrXjilvttKBPXVSgPKUP8w6hAByT7Hcgo5tHOSTbcnC-z40ybdWtvLANAfY4du4jhbpZW4 User-Agent:docker/18.09.5 go/go1.10.8 git-commit/e8ff056 kernel/3.10.0-862.11.6.1.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.5 \(linux\)) X-Forwarded-For:] Identifier: Md5: Path:/peer/file/471acf92-1404-458c-b5ea-9d2024d9971d-186-1572320619.311 PeerID:kube-master-3- RawURL:https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f SupernodeIP: TaskURL:https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f}: failed to get http file Length: Get https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f: dial tcp: i/o timeout: {"Code":10,"Msg":"unknow error"}

but we can ping registry domain manually by wget in the supernode container:

bash-4.4# time wget https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f
Connecting to test1.caicloudprivatetest.com (
ssl_client: test1.caicloudprivatetest.com: certificate verification failed: self signed certificate in certificate chain
wget: error getting response: Connection reset by peer

real    0m0.030s
user    0m0.001s
sys     0m0.002s

I found the timeout is set to 4s:

	// send request
	resp, err := HTTPGetTimeout(url, headers, 4*time.Second)
	if err != nil {
		return 0, 0, err

Very confused why was it timeout?

Ⅱ. Describe what happened

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • dragonfly version: v0.4.3
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

zhujian7 avatar Oct 29 '19 04:10 zhujian7

@zhujian7 Did this error happen multiple times or just happen once?

yeya24 avatar Oct 29 '19 04:10 yeya24

Did this error happen multiple times or just happen once?

@yeya24 multiple times.

zhujian7 avatar Oct 29 '19 06:10 zhujian7

We have fixed that in the master branch. Could you please try again with the master branch? THX

starnop avatar Oct 29 '19 08:10 starnop

@Starnop I tried with the master branch, but it still appears. Could you please tell me which PR fixed this problem.

zhujian7 avatar Oct 30 '19 10:10 zhujian7

I have the same problem

gzchen008 avatar May 26 '20 04:05 gzchen008

@cgzchen which version did you use? Did you solve the problem?

zhujian7 avatar Jun 02 '20 10:06 zhujian7

Some new additional information I found to supply: the environment I tested can not connect to the public network. And I found that the /etc/resolv.conf in the supernode container is:

# cat /etc/resolv.conf

search localdomain


and /etc/hosts holds:

.... test1.caicloudprivatetest.com

I got a conclusion that:

  • the supernode procedure resolving the domain (test1.caicloudprivatetest.com) used the /etc/resolv.conf, and because the environment can not connect to the public network, timeout occurred.
  • when I wget https://test1.caicloudprivatetest.com/v2/... in the container manually, It used the /ect/hosts to resolve the domain, and it succeeded.

So I changed the /etc/resolv.conf to empty, and the supernode can normally get the file length.

A remained question: what is the difference between the supernode procedure and wget manually?

cc @cgzchen @Starnop

zhujian7 avatar Jun 04 '20 01:06 zhujian7

Add RUN test -e /etc/nsswitch.conf || echo 'hosts: files dns' > /etc/nsswitch.conf in the supernode Dockerfile and rebuild a supernode image solved my problem.

zhujian7 avatar Jun 17 '20 02:06 zhujian7


zhujian7 avatar Aug 31 '20 13:08 zhujian7