chproxy icon indicating copy to clipboard operation
chproxy copied to clipboard

cannot reach clickhouse host?

Open sunny19930321 opened this issue 5 years ago • 20 comments

When using agents, what are the causes of the following problems? Caused by: java.lang.Throwable: [ Id: 160857F337DC563A; User "tmplarge"(1) proxying as "default"(1) to "d085126100.aliyun.com:8123"(6); RemoteAddr: "10.13.56.73:51080"; LocalAddr: "10.85.129.101:9090"; Duration: 825 ?s]: cannot reach d085126100.aliyun.com:8123; query: "select timezone()\nFORMAT TabSeparatedWithNamesAndTypes;"

sunny19930321 avatar May 14 '20 06:05 sunny19930321

We are meet same question, our chproxy version is v1.14.0, any body know how to fix it ?

VitoLiao avatar Jun 19 '20 05:06 VitoLiao

Can you verify if this request works without using agent, maybe via curl or any other http client?

hagen1778 avatar Jun 22 '20 18:06 hagen1778

Facing the same issue intermittently? Are there updates on this?

sidanasparsh avatar Jul 09 '20 00:07 sidanasparsh

+1

[502] [ Id: 161FB3A842CE4135; User "default"(1) proxying as "default"(1) to "CHCluster03:8123"(6); RemoteAddr: "10.0.0.24:51324"; LocalAddr: "10.0.0.12:9090"; Duration: 16798 μs]: cannot reach CHCluster03:8123; query: ..

CHProxy 1.14

apetrov88 avatar Jul 11 '20 12:07 apetrov88

Can you verify if this request works without using agent, maybe via curl or any other http client?

@hagen1778 ,Verify that CK service is ok to request

sunny19930321 avatar Jul 11 '20 12:07 sunny19930321

Facing the same issue intermittently? Are there updates on this?

@sidanasparsh Try to adjust the timeout, but it doesn't seem to work

sunny19930321 avatar Jul 11 '20 12:07 sunny19930321

Facing the same issue intermittently? Are there updates on this?

Have you found the cause of this problem?

sunny19930321 avatar Sep 29 '20 08:09 sunny19930321

+1

[502] [ Id: 161FB3A842CE4135; User "default"(1) proxying as "default"(1) to "CHCluster03:8123"(6); RemoteAddr: "10.0.0.24:51324"; LocalAddr: "10.0.0.12:9090"; Duration: 16798 μs]: cannot reach CHCluster03:8123; query: ..

CHProxy 1.14

Have you found the cause of this problem?

sunny19930321 avatar Sep 29 '20 08:09 sunny19930321

+1, how to fix it?

karas2015 avatar Jan 12 '21 12:01 karas2015

Proxy returns "cannot reach" error when it unable to establish connection to the given address https://github.com/Vertamedia/chproxy/blob/1758e7399fe57c97aeec8e55dd13c6300399969b/proxy.go#L186-L193. I do not know what causes this because a lot of things can be involved to affect reachability between "your_application"<=>"chproxy"<=>"clickhouse" scheme. Btw, proxy exposes host_health metric (and a plenty of others) to show if configured CH host is reachable. Can you check the state of this metric in the moments when query from the agent fails? If you send queries without agent - does it work?

hagen1778 avatar Jan 16 '21 12:01 hagen1778

We are experiencing the same kind of connection issues

ERROR: 2021/02/03 06:36:39 proxy.go:192: [ Id: 165F9CDFD12DDA70; User "compass-insert-hits"(1) proxying as "admin"(1) to "myclusternode02.io"(6); RemoteAddr: "100.64.4.2:50028"; LocalAddr: "100.65.205.183:80"; Duration: 60133561 μs]: cannot reach myclusternode02.io:8123; query: "INSERT INTO hits ........." ... ERROR: 2021/02/03 13:05:11 scope.go:643: error while health-checking "myclusternode01.io:8123" host: cannot send request in 3.00015176s: Get "http://myclusternode01.io.io:8123/?query=SELECT%201": context deadline exceeded ..... ERROR: 2021/02/04 11:46:26 proxy.go:192: [ Id: 165F9CDFD1355353; User "newsroom-cache"(1) proxying as "default"(1) to "asinglechnode.io:8123"(6); RemoteAddr: "100.65.205.170:47344"; LocalAddr: "100.65.205.183:80"; Duration: 1097 μs]: cannot reach asinglechnode.io:8123; query: "SELECT maxMerge(Hit.event_time) as maxEventTime......"

We have done a full network check simulating the healthchecks with curl and checking for packet loss with mtr with no luck. The network is working flawless with not even a single packet lost or a failed curl.

JustHarris avatar Feb 04 '21 16:02 JustHarris

We have same problem with three clickhouse nodes. Also, ClickHouse backends available at this time directly.

akimrx avatar Mar 12 '21 11:03 akimrx

The default keepalive-timeout in CH is 3 seconds, while the default keepalive time used by net.Dialer is 15 seconds. That won't work.

bzed avatar Jul 07 '21 08:07 bzed

Using

diff --git a/proxy.go b/proxy.go
index 11684b6..356cd2c 100644
--- a/proxy.go
+++ b/proxy.go
@@ -3,6 +3,7 @@ package main
 import (
        "context"
        "fmt"
+       "net"
        "net/http"
        "net/http/httputil"
        "net/url"
@@ -43,6 +44,24 @@ func newReverseProxy() *reverseProxy {
                        // Suppress error logging in ReverseProxy, since all the errors
                        // are handled and logged in the code below.
                        ErrorLog: log.NilLogger,
+                       ErrorHandler: func(rw http.ResponseWriter, req *http.Request, err error) {
+                               log.Errorf("http: proxy error: %v", err)
+                               rw.WriteHeader(http.StatusBadGateway)
+                       },
+                       Transport: &http.Transport{
+                               // DisableKeepAlives: false,
+                               // Proxy: http.ProxyFromEnvironment,
+                               DialContext: (&net.Dialer{
+                                       Timeout:   2 * time.Second,
+                                       KeepAlive: 2 * time.Second,
+                                       DualStack: true,
+                               }).DialContext,
+                               // ForceAttemptHTTP2:     true,
+                               // MaxIdleConns:          100,
+                               // IdleConnTimeout:       90 * time.Second,
+                               // TLSHandshakeTimeout:   10 * time.Second,
+                               // ExpectContinueTimeout: 1 * time.Second,
+                       },
                },
                reloadSignal: make(chan struct{}),
                reloadWG:     sync.WaitGroup{},

seems to work fine so far. Based on #121 - but just lowering keepalive time/timeouts.

bzed avatar Jul 07 '21 09:07 bzed

Seems to make things better, but doesn't fix them unfortunately.

bzed avatar Jul 07 '21 09:07 bzed

Jul 07 11:48:00 mon01 chproxy[40652]: ERROR: 2021/07/07 09:48:00 proxy.go:48: http: proxy error: net/http: HTTP/1.x transport connection broken: write tcp 127.0.0.1:58570->127.0.0.1:8123: write: broken pipe

bzed avatar Jul 07 '21 09:07 bzed

The default keepalive-timeout in CH is 3 seconds, while the default keepalive time used by net.Dialer is 15 seconds. That won't work.

@bzed I also found the similar phenomenon, but I think the HTTP param <keep_alive_timeout>3</keep_alive_timeout> in CH is corresponding to IdleConnTimeout of Transport in CHProxy.

so I tune the keep_alive_timeout in CH to 90, as the same value in CHProxy. That does works! The cannot reach host error was gone.

mchades avatar Aug 17 '21 07:08 mchades

@mchades very good

sunny19930321 avatar Aug 18 '21 03:08 sunny19930321

client keepalive should substantially bigger than server keepalive https://github.com/ClickHouse/ClickHouse/issues/52571#issuecomment-1650266265 https://github.com/ClickHouse/ClickHouse/pull/53068

-- update: bigger not smaller

den-crane avatar Oct 16 '23 08:10 den-crane

The default keepalive-timeout in CH is 3 seconds, while the default keepalive time used by net.Dialer is 15 seconds. That won't work.

@bzed I also found the similar phenomenon, but I think the HTTP param <keep_alive_timeout>3</keep_alive_timeout> in CH is corresponding to IdleConnTimeout of Transport in CHProxy.

so I tune the keep_alive_timeout in CH to 90, as the same value in CHProxy. That does works! The cannot reach host error was gone.

That Did work! The following code demonstrates the related settings for HTTP Transport in the CH proxy. transport := &http.Transport{ Proxy: http.ProxyFromEnvironment, DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) { dialer := &net.Dialer{ Timeout: 30 * time.Second, KeepAlive: 30 * time.Second, } return dialer.DialContext(ctx, network, addr) }, ForceAttemptHTTP2: true, MaxIdleConns: cfgCp.MaxIdleConns, MaxIdleConnsPerHost: cfgCp.MaxIdleConnsPerHost, IdleConnTimeout: 9 * time.Second, TLSHandshakeTimeout: 10 * time.Second, ExpectContinueTimeout: 1 * time.Second, }

egplat avatar Nov 13 '23 07:11 egplat