chproxy
chproxy copied to clipboard
cannot reach clickhouse host?
When using agents, what are the causes of the following problems? Caused by: java.lang.Throwable: [ Id: 160857F337DC563A; User "tmplarge"(1) proxying as "default"(1) to "d085126100.aliyun.com:8123"(6); RemoteAddr: "10.13.56.73:51080"; LocalAddr: "10.85.129.101:9090"; Duration: 825 ?s]: cannot reach d085126100.aliyun.com:8123; query: "select timezone()\nFORMAT TabSeparatedWithNamesAndTypes;"
We are meet same question, our chproxy version is v1.14.0, any body know how to fix it ?
Can you verify if this request works without using agent, maybe via curl or any other http client?
Facing the same issue intermittently? Are there updates on this?
+1
[502] [ Id: 161FB3A842CE4135; User "default"(1) proxying as "default"(1) to "CHCluster03:8123"(6); RemoteAddr: "10.0.0.24:51324"; LocalAddr: "10.0.0.12:9090"; Duration: 16798 μs]: cannot reach CHCluster03:8123; query: ..
CHProxy 1.14
Can you verify if this request works without using agent, maybe via
curlor any other http client?
@hagen1778 ,Verify that CK service is ok to request
Facing the same issue intermittently? Are there updates on this?
@sidanasparsh Try to adjust the timeout, but it doesn't seem to work
Facing the same issue intermittently? Are there updates on this?
Have you found the cause of this problem?
+1
[502] [ Id: 161FB3A842CE4135; User "default"(1) proxying as "default"(1) to "CHCluster03:8123"(6); RemoteAddr: "10.0.0.24:51324"; LocalAddr: "10.0.0.12:9090"; Duration: 16798 μs]: cannot reach CHCluster03:8123; query: ..
CHProxy 1.14
Have you found the cause of this problem?
+1, how to fix it?
Proxy returns "cannot reach" error when it unable to establish connection to the given address https://github.com/Vertamedia/chproxy/blob/1758e7399fe57c97aeec8e55dd13c6300399969b/proxy.go#L186-L193. I do not know what causes this because a lot of things can be involved to affect reachability between "your_application"<=>"chproxy"<=>"clickhouse" scheme.
Btw, proxy exposes host_health metric (and a plenty of others) to show if configured CH host is reachable. Can you check the state of this metric in the moments when query from the agent fails? If you send queries without agent - does it work?
We are experiencing the same kind of connection issues
ERROR: 2021/02/03 06:36:39 proxy.go:192: [ Id: 165F9CDFD12DDA70; User "compass-insert-hits"(1) proxying as "admin"(1) to "myclusternode02.io"(6); RemoteAddr: "100.64.4.2:50028"; LocalAddr: "100.65.205.183:80"; Duration: 60133561 μs]: cannot reach myclusternode02.io:8123; query: "INSERT INTO hits ........."
...
ERROR: 2021/02/03 13:05:11 scope.go:643: error while health-checking "myclusternode01.io:8123" host: cannot send request in 3.00015176s: Get "http://myclusternode01.io.io:8123/?query=SELECT%201": context deadline exceeded
.....
ERROR: 2021/02/04 11:46:26 proxy.go:192: [ Id: 165F9CDFD1355353; User "newsroom-cache"(1) proxying as "default"(1) to "asinglechnode.io:8123"(6); RemoteAddr: "100.65.205.170:47344"; LocalAddr: "100.65.205.183:80"; Duration: 1097 μs]: cannot reach asinglechnode.io:8123; query: "SELECT maxMerge(Hit.event_time) as maxEventTime......"
We have done a full network check simulating the healthchecks with curl and checking for packet loss with mtr with no luck. The network is working flawless with not even a single packet lost or a failed curl.
We have same problem with three clickhouse nodes. Also, ClickHouse backends available at this time directly.
The default keepalive-timeout in CH is 3 seconds, while the default keepalive time used by net.Dialer is 15 seconds. That won't work.
Using
diff --git a/proxy.go b/proxy.go
index 11684b6..356cd2c 100644
--- a/proxy.go
+++ b/proxy.go
@@ -3,6 +3,7 @@ package main
import (
"context"
"fmt"
+ "net"
"net/http"
"net/http/httputil"
"net/url"
@@ -43,6 +44,24 @@ func newReverseProxy() *reverseProxy {
// Suppress error logging in ReverseProxy, since all the errors
// are handled and logged in the code below.
ErrorLog: log.NilLogger,
+ ErrorHandler: func(rw http.ResponseWriter, req *http.Request, err error) {
+ log.Errorf("http: proxy error: %v", err)
+ rw.WriteHeader(http.StatusBadGateway)
+ },
+ Transport: &http.Transport{
+ // DisableKeepAlives: false,
+ // Proxy: http.ProxyFromEnvironment,
+ DialContext: (&net.Dialer{
+ Timeout: 2 * time.Second,
+ KeepAlive: 2 * time.Second,
+ DualStack: true,
+ }).DialContext,
+ // ForceAttemptHTTP2: true,
+ // MaxIdleConns: 100,
+ // IdleConnTimeout: 90 * time.Second,
+ // TLSHandshakeTimeout: 10 * time.Second,
+ // ExpectContinueTimeout: 1 * time.Second,
+ },
},
reloadSignal: make(chan struct{}),
reloadWG: sync.WaitGroup{},
seems to work fine so far. Based on #121 - but just lowering keepalive time/timeouts.
Seems to make things better, but doesn't fix them unfortunately.
Jul 07 11:48:00 mon01 chproxy[40652]: ERROR: 2021/07/07 09:48:00 proxy.go:48: http: proxy error: net/http: HTTP/1.x transport connection broken: write tcp 127.0.0.1:58570->127.0.0.1:8123: write: broken pipe
The default keepalive-timeout in CH is 3 seconds, while the default keepalive time used by net.Dialer is 15 seconds. That won't work.
@bzed I also found the similar phenomenon, but I think the HTTP param <keep_alive_timeout>3</keep_alive_timeout> in CH is corresponding to IdleConnTimeout of Transport in CHProxy.
so I tune the keep_alive_timeout in CH to 90, as the same value in CHProxy. That does works! The cannot reach host error was gone.
@mchades very good
client keepalive should substantially bigger than server keepalive https://github.com/ClickHouse/ClickHouse/issues/52571#issuecomment-1650266265 https://github.com/ClickHouse/ClickHouse/pull/53068
-- update: bigger not smaller
The default keepalive-timeout in CH is 3 seconds, while the default keepalive time used by net.Dialer is 15 seconds. That won't work.
@bzed I also found the similar phenomenon, but I think the HTTP param
<keep_alive_timeout>3</keep_alive_timeout>in CH is corresponding toIdleConnTimeoutof Transport in CHProxy.so I tune the
keep_alive_timeoutin CH to 90, as the same value in CHProxy. That does works! The cannot reach host error was gone.
That Did work! The following code demonstrates the related settings for HTTP Transport in the CH proxy.
transport := &http.Transport{ Proxy: http.ProxyFromEnvironment, DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) { dialer := &net.Dialer{ Timeout: 30 * time.Second, KeepAlive: 30 * time.Second, } return dialer.DialContext(ctx, network, addr) }, ForceAttemptHTTP2: true, MaxIdleConns: cfgCp.MaxIdleConns, MaxIdleConnsPerHost: cfgCp.MaxIdleConnsPerHost, IdleConnTimeout: 9 * time.Second, TLSHandshakeTimeout: 10 * time.Second, ExpectContinueTimeout: 1 * time.Second, }