Dragonfly
Dragonfly copied to clipboard
failed to get file length from http client i/o timeout
Ⅰ. Issue Description
supernode failed to get file length from http client:
2019-10-29 03:43:13.453 INFO sign:8 : success to init local ip of supernode, use ip: 122.168.3.213
2019-10-29 03:43:13.453 INFO sign:8 : start to run supernode
2019-10-29 03:43:40.414 INFO sign:8 : success to register peer &{IP:122.168.3.203 HostName:kube-master-3 Port:0 Version:0.4.3}
2019-10-29 03:43:40.616 INFO sign:8 : success to register peer &{IP:122.168.3.203 HostName:kube-master-3 Port:0 Version:0.4.3}
2019-10-29 03:43:40.810 INFO sign:8 : success to register peer &{IP:122.168.3.203 HostName:kube-master-3 Port:0 Version:0.4.3}
2019-10-29 03:43:41.110 INFO sign:8 : success to register peer &{IP:122.168.3.203 HostName:kube-master-3 Port:0 Version:0.4.3}
2019-10-29 03:43:43.416 ERRO sign:8 : failed to get file length from http client for taskID(fc4ace4d1e109d742a7c3de06d5c0dd768a885022fc23fac095c742cf239e457): failed to get http file Length: Get https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f: dial tcp: i/o timeout: {"Code":10,"Msg":"unknow error"}
2019-10-29 03:43:43.416 INFO sign:8 : failed to add or update task with req &{CID:122.168.3.203-186-1572320619.311 CallSystem: Dfdaemon:true Filter:[] Headers:map[Authorization:Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6IlY0TEg6TkpIVDpSQ1FMOkRUUUg6VVBJQjpOREFJOk5EVlg6VkZNMjo2NURQOlQ1Q086TjZBRTpHTUhPIn0.eyJpc3MiOiJoYXJib3ItdG9rZW4taXNzdWVyIiwic3ViIjoiIiwiYXVkIjoiaGFyYm9yLXJlZ2lzdHJ5IiwiZXhwIjoxNTcyMzIyNDE3LCJuYmYiOjE1NzIzMjA2MTcsImlhdCI6MTU3MjMyMDYxNywianRpIjoicTdyWjROcXJRRVdKRmJNQSIsImFjY2VzcyI6W3sidHlwZSI6InJlcG9zaXRvcnkiLCJuYW1lIjoibGlicmFyeS9uZ2lueCIsImFjdGlvbnMiOlsicHVsbCJdfV19.s5VJJOQvWcVFAt-l3n9PV3SJWZT7hnd014a-8XJJrHRPAULWHhUZiOmdU1XojvUxQx1chuOktXi1M3t81y8-QqoYpgfBHjQn-n7hVp4--v8wiSfxvzVa30sqv42bIEaZ8iZPQMEfuY0m6F4u-1hcIuov6I5CyJCJOsx231LL_aZu97Bd5fHGYx2qJJzCjQ7dtJ7wXIIZgV5Mjp6lomVjIl086rldecCL7OXCsFt_jh3D4LfezTf9GJLneieKKZqxa0CAhwSDQOIyPErjaHhLlJrFGCaCOxCwj20QQD7ZAx69ah8wodgjdnzHwnaWbeQC4B4Sukbc-sfICFrAK3JCd4VoIrwvx4QHibcbT6ZUF8N-FCKgvujWa07KXF96ASYKLCilNWiKQMtffTW2URaPcEccOrYRIMbQIpWa4OXIX-nIHvAnAYNuHQf3ywxS-nwRfjlIVhL1p5I86xFTDYts_k0Mt4G8nbna-dGB1-dSmH9C7hJL1JKltIYII7JL8kJdMYCnKiRiGdqTNEu7V7gcfLX7Y1LpwjH6uFyDP-Sso8Uxwlwmh-5NEqDatR_n18ut6d44fTQZ0nkODdBsOVGkMpFqN1e3SoOUv1hClmrXjilvttKBPXVSgPKUP8w6hAByT7Hcgo5tHOSTbcnC-z40ybdWtvLANAfY4du4jhbpZW4 User-Agent:docker/18.09.5 go/go1.10.8 git-commit/e8ff056 kernel/3.10.0-862.11.6.1.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.5 \(linux\)) X-Forwarded-For:127.0.0.1] Identifier: Md5: Path:/peer/file/471acf92-1404-458c-b5ea-9d2024d9971d-186-1572320619.311 PeerID:kube-master-3-122.168.3.203-1572320620414903266 RawURL:https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f SupernodeIP:122.168.3.213 TaskURL:https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f}: failed to get http file Length: Get https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f: dial tcp: i/o timeout: {"Code":10,"Msg":"unknow error"}
but we can ping registry domain manually by wget in the supernode container:
bash-4.4# time wget https://test1.caicloudprivatetest.com/v2/library/nginx/blobs/sha256:faa42fe99fd154460cd5f2174e74b0b004de5a139b7764a990a872f650dc996f
Connecting to test1.caicloudprivatetest.com (122.168.3.218:443)
ssl_client: test1.caicloudprivatetest.com: certificate verification failed: self signed certificate in certificate chain
wget: error getting response: Connection reset by peer
real 0m0.030s
user 0m0.001s
sys 0m0.002s
bash-4.4#
I found the timeout is set to 4s:
// send request
resp, err := HTTPGetTimeout(url, headers, 4*time.Second)
if err != nil {
return 0, 0, err
}
Very confused why was it timeout?
Ⅱ. Describe what happened
Ⅲ. Describe what you expected to happen
Ⅳ. How to reproduce it (as minimally and precisely as possible)
Ⅴ. Anything else we need to know?
Ⅵ. Environment:
- dragonfly version: v0.4.3
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
@zhujian7 Did this error happen multiple times or just happen once?
Did this error happen multiple times or just happen once?
@yeya24 multiple times.
We have fixed that in the master branch. Could you please try again with the master branch? THX
@Starnop I tried with the master branch, but it still appears. Could you please tell me which PR fixed this problem.
I have the same problem
@cgzchen which version did you use? Did you solve the problem?
Some new additional information I found to supply: the environment I tested can not connect to the public network. And I found that the /etc/resolv.conf
in the supernode container is:
# cat /etc/resolv.conf
search localdomain
nameserver 8.8.8.8
nameserver 8.8.4.4
and /etc/hosts
holds:
....
122.168.3.218 test1.caicloudprivatetest.com
....
I got a conclusion that:
- the supernode procedure resolving the domain (test1.caicloudprivatetest.com) used the
/etc/resolv.conf
, and because the environment can not connect to the public network8.8.8.8
, timeout occurred. - when I
wget https://test1.caicloudprivatetest.com/v2/...
in the container manually, It used the/ect/hosts
to resolve the domain, and it succeeded.
So I changed the /etc/resolv.conf
to empty, and the supernode can normally get the file length.
A remained question: what is the difference between the supernode procedure and wget
manually?
cc @cgzchen @Starnop
Add RUN test -e /etc/nsswitch.conf || echo 'hosts: files dns' > /etc/nsswitch.conf
in the supernode Dockerfile and rebuild a supernode image solved my problem.
/close