ngx_upstream_resolveMK icon indicating copy to clipboard operation
ngx_upstream_resolveMK copied to clipboard

lost some request while DNS resolving

Open HatsuneMiku3939 opened this issue 8 years ago • 6 comments

reported from #1

ngx_http_upstream_resolveMK_get_peer fitting early resolved values into peer (before RESOLVE_STATS_DONE). Near one second, but happens time-to-time and in result 0.05% of requests going to black hole. It's a lot.

2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: resolve_start ok for "home-docker.mcbyte.net"
2017/03/13 20:08:12 [debug] 23470#0: resolve: "home-docker.mcbyte.net"
2017/03/13 20:08:12 [debug] 23470#0: resolve: "_php-fpm-main-9000._tcp.home-docker.mcbyte.net" SRV 39707
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: resolved_num=1
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: upstream to DNS peer (192.168.1.11:0)
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: resolving
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: resolved_num=1
2017/03/13 20:08:12 [debug] 23470#0: *1 upstream_resolveMK: upstream to DNS peer (192.168.1.11:0)

resolving not finished, but now peer have wrong value "192.168.1.11:0, 192.168.1.11:0" (as one peer) and all request failed

2017/03/13 20:08:12 [debug] 23470#0: resolver DNS response 39707 fl:8580 1/1/0/1
2017/03/13 20:08:12 [debug] 23470#0: resolver DNS response qt:33 cl:1
2017/03/13 20:08:12 [debug] 23470#0: resolver qs:_php-fpm-main-9000._tcp.home-docker.mcbyte.net
2017/03/13 20:08:12 [debug] 23470#0: resolver nsrvs:1 cname:0000000000000000 ttl:68
2017/03/13 20:08:12 [debug] 23470#0: resolve: "620009deb596._php-fpm-main-9000._tcp.home-docker.mcbyte.net"
2017/03/13 20:08:12 [debug] 23470#0: resolve: "620009deb596._php-fpm-main-9000._tcp.home-docker.mcbyte.net" A 55296
2017/03/13 20:08:12 [debug] 23470#0: resolver DNS response 55296 fl:8580 1/1/0/0
2017/03/13 20:08:12 [debug] 23470#0: resolver DNS response qt:1 cl:1
2017/03/13 20:08:12 [debug] 23470#0: resolver qs:620009deb596._php-fpm-main-9000._tcp.home-docker.mcbyte.net
2017/03/13 20:08:12 [debug] 23470#0: resolver naddrs:1 cname:0000000000000000 ttl:68
2017/03/13 20:08:12 [debug] 23470#0: resolve name done: 0
2017/03/13 20:08:12 [debug] 23470#0: resolver expire
2017/03/13 20:08:12 [debug] 23470#0: upstream_resolveMK: "home-docker.mcbyte.net" resolved state(0: Unknown error)
2017/03/13 20:08:12 [debug] 23470#0: resolve name done: 0
2017/03/13 20:08:12 [debug] 23470#0: resolver expire
2017/03/13 20:08:13 [debug] 23470#0: *1 upstream_resolveMK: update from DNS cache
2017/03/13 20:08:13 [debug] 23470#0: *1 upstream_resolveMK: resolved_num=1
2017/03/13 20:08:13 [debug] 23470#0: *1 upstream_resolveMK: upstream to DNS peer (192.168.1.11:9002)

resolving finished, now peer is "192.168.1.11:9002"

HatsuneMiku3939 avatar Mar 14 '17 09:03 HatsuneMiku3939

@TrurlMcByte Can you reproduce it?

HatsuneMiku3939 avatar Apr 09 '17 08:04 HatsuneMiku3939

It's perfectly stable :) So, any time on start it's happens just "by design". First few (from 1 to 10 at home, depending on DNS resolving chain) seconds ngx_resolve return A record (if exists), but not SRV, because resolving not finished yet. Also it's happens after long break in requests when internal DNS cache is empty. As DNS server I'm use standard SkyDNS

TrurlMcByte avatar Apr 09 '17 09:04 TrurlMcByte

I get the same using Mesos DNS as well.

justinholmes avatar Apr 09 '17 23:04 justinholmes

Thank you for your reporting. Lost some request after long break is critical problem. I'm working on it.

HatsuneMiku3939 avatar Apr 10 '17 01:04 HatsuneMiku3939

I experience the same problem. What is the ETA on a fix @HatsuneMiku3939?

jdegger avatar Apr 21 '17 12:04 jdegger

+1 having this problem also, causing issues on critical services :(

phil-hudson avatar Apr 21 '17 12:04 phil-hudson