clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

<Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below)

Open intfish123 opened this issue 3 years ago • 11 comments

<Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):

  1. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&) @ 0x11d0c84e in /usr/bin/clickhouse
  2. Poco::Net::SocketImpl::peerAddress() @ 0x11d0eac6 in /usr/bin/clickhouse
  3. DB::ReadBufferFromPocoSocket::ReadBufferFromPocoSocket(Poco::Net::Socket&, unsigned long) @ 0xe8de1ad in /usr/bin/clickhouse
  4. DB::TCPHandler::runImpl() @ 0xf64c6be in /usr/bin/clickhouse
  5. DB::TCPHandler::run() @ 0xf65f879 in /usr/bin/clickhouse
  6. Poco::Net::TCPServerConnection::start() @ 0x11d138af in /usr/bin/clickhouse
  7. Poco::Net::TCPServerDispatcher::run() @ 0x11d152c1 in /usr/bin/clickhouse
  8. Poco::PooledThread::run() @ 0x11e4b9e9 in /usr/bin/clickhouse
  9. Poco::ThreadImpl::runnableEntry(void*) @ 0x11e4784a in /usr/bin/clickhouse
  10. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
  11. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so (version 21.3.12.2 (official build))

intfish123 avatar Jun 02 '21 12:06 intfish123

What's wrong? Clickhouse server print this error infomation every second, Can any body help me?

intfish123 avatar Jun 02 '21 12:06 intfish123

did you deploy clickhouse sever inside kubernetes with clickhouse-operator? could you share your kind: ClickhouseInstallation manifest?

Slach avatar Jun 03 '21 12:06 Slach

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "ch1"
spec:
  defaults:
    templates:
      podTemplate: default
  templates:
    podTemplates:
      - name: default
        spec:
          containers:
            - name: clickhouse-pod
              image: yandex/clickhouse-server:21.3
              volumeMounts:
                - name: data-storage-vc-template
                  mountPath: /var/lib/clickhouse
                - name: log-storage-vc-template
                  mountPath: /var/log/clickhouse-server
    volumeClaimTemplates:
      - name: data-storage-vc-template
        spec:
          accessModes:
            - ReadWriteOnce
          storageClassName: alicloud-nas-subpath
          resources:
            requests:
              storage: 400Gi
      - name: log-storage-vc-template
        spec:
          accessModes:
            - ReadWriteOnce
          storageClassName: alicloud-nas-subpath
          resources:
            requests:
              storage: 10Gi
  configuration:
    clusters:
      - name: "c1"
        layout:
          shardsCount: 1

@Slach

intfish123 avatar Jun 06 '21 03:06 intfish123

Yes, I did deploy clickhouse sever inside kubernetes with clickhouse-operator.

intfish123 avatar Jun 06 '21 03:06 intfish123

I also encountered this problem, did you find out the cause of this problem?

czhfe avatar Jul 12 '21 10:07 czhfe

@czhfe @intfish123 it is kubernetes inside GCP or other cloud provider?

look like healthcheck probe from kubernetes external implementation for service type: LoadBalancer they just try TCP connect and close connect unexpectly

Slach avatar Jul 12 '21 11:07 Slach

@czhfe @intfish123 it is kubernetes inside GCP or other cloud provider?

look like healthcheck probe from kubernetes external implementation for service type: LoadBalancer they just try TCP connect and close connect unexpectly

It is indeed a health status detection problem

I'm using Huawei Cloud, and Huawei Cloud's load balancer has health status detection on by default, so I don't have this problem if I turn off health status detection on my side.

Please tell me where I can see this is the problem.

czhfe avatar Jul 13 '21 06:07 czhfe

We've faced we the same issue on GKE with the operator but we have no loader balancer service, only cluster IP service.

ossinkine avatar Oct 05 '21 12:10 ossinkine

But our log is little bit defferent:

2021.10.05 12:42:35.258721 [ 251 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):

0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x13b6c5ee in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x13b6e836 in /usr/bin/clickhouse
2. DB::HTTPServerRequest::HTTPServerRequest(std::__1::shared_ptr<DB::Context const>, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x1100708b in /usr/bin/clickhouse
3. DB::HTTPServerConnection::run() @ 0x11005eee in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x13b738af in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x13b7533a in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x13ca81b9 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x13ca444a in /usr/bin/clickhouse
8. start_thread @ 0x9609 in /lib/libpthread.so.0
9. __clone @ 0x122293 in /lib/libc.so.6
 (version 21.8.8.29 (official build))

ossinkine avatar Oct 05 '21 12:10 ossinkine

does anybody know how to solve this issue?

nsakovich avatar Dec 09 '21 15:12 nsakovich

@nsakovich @ossinkine issue is not related to clickhouse-operator or clickhouse-server itself issue related to internal Kubernetes machinery (maybe it service mech control plane health check, maybe it external load balancer health probes)

which create TCP connection to HTTP clickhouse port and abnormally close it

you could run apt-get update && apt-get install tcpdump && tcpdump -i any -w clickhouse.pcap port 8123 watch to /var/log/clickhouse-server/clickhouse-server.err.log found Code: 1000 error time and look to clickhouse.pcap under wireshark and try to detect what's going on on TCP level and which ip responsible to SYN -> SYN ACK -> ACK -> FIN tcp packets sequence without HTTP request

Slach avatar Dec 09 '21 15:12 Slach