ngx_upstream_resolveMK
ngx_upstream_resolveMK copied to clipboard
lost some request while DNS resolving
reported from #1
ngx_http_upstream_resolveMK_get_peer fitting early resolved values into peer (before RESOLVE_STATS_DONE). Near one second, but happens time-to-time and in result 0.05% of requests going to black hole. It's a lot.
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: resolve_start ok for "home-docker.mcbyte.net"
2017/03/13 20:08:12 [debug] 23470#0: resolve: "home-docker.mcbyte.net"
2017/03/13 20:08:12 [debug] 23470#0: resolve: "_php-fpm-main-9000._tcp.home-docker.mcbyte.net" SRV 39707
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: resolved_num=1
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: upstream to DNS peer (192.168.1.11:0)
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: resolving
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: resolved_num=1
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: upstream to DNS peer (192.168.1.11:0)
resolving not finished, but now peer have wrong value "192.168.1.11:0, 192.168.1.11:0" (as one peer) and all request failed
2017/03/13 20:08:12 [debug] 23470#0: resolver DNS response 39707 fl:8580 1/1/0/1
2017/03/13 20:08:12 [debug] 23470#0: resolver DNS response qt:33 cl:1
2017/03/13 20:08:12 [debug] 23470#0: resolver qs:_php-fpm-main-9000._tcp.home-docker.mcbyte.net
2017/03/13 20:08:12 [debug] 23470#0: resolver nsrvs:1 cname:0000000000000000 ttl:68
2017/03/13 20:08:12 [debug] 23470#0: resolve: "620009deb596._php-fpm-main-9000._tcp.home-docker.mcbyte.net"
2017/03/13 20:08:12 [debug] 23470#0: resolve: "620009deb596._php-fpm-main-9000._tcp.home-docker.mcbyte.net" A 55296
2017/03/13 20:08:12 [debug] 23470#0: resolver DNS response 55296 fl:8580 1/1/0/0
2017/03/13 20:08:12 [debug] 23470#0: resolver DNS response qt:1 cl:1
2017/03/13 20:08:12 [debug] 23470#0: resolver qs:620009deb596._php-fpm-main-9000._tcp.home-docker.mcbyte.net
2017/03/13 20:08:12 [debug] 23470#0: resolver naddrs:1 cname:0000000000000000 ttl:68
2017/03/13 20:08:12 [debug] 23470#0: resolve name done: 0
2017/03/13 20:08:12 [debug] 23470#0: resolver expire
2017/03/13 20:08:12 [debug] 23470#0: upstream_resolveMK: "home-docker.mcbyte.net" resolved state(0: Unknown error)
2017/03/13 20:08:12 [debug] 23470#0: resolve name done: 0
2017/03/13 20:08:12 [debug] 23470#0: resolver expire
2017/03/13 20:08:13 [debug] 23470#0: *1 upstream_resolveMK: update from DNS cache
2017/03/13 20:08:13 [debug] 23470#0: *1 upstream_resolveMK: resolved_num=1
2017/03/13 20:08:13 [debug] 23470#0: *1 upstream_resolveMK: upstream to DNS peer (192.168.1.11:9002)
resolving finished, now peer is "192.168.1.11:9002"
@TrurlMcByte Can you reproduce it?
It's perfectly stable :) So, any time on start it's happens just "by design". First few (from 1 to 10 at home, depending on DNS resolving chain) seconds ngx_resolve return A record (if exists), but not SRV, because resolving not finished yet. Also it's happens after long break in requests when internal DNS cache is empty. As DNS server I'm use standard SkyDNS
I get the same using Mesos DNS as well.
Thank you for your reporting. Lost some request after long break is critical problem. I'm working on it.
I experience the same problem. What is the ETA on a fix @HatsuneMiku3939?
+1 having this problem also, causing issues on critical services :(