clickhouse-operator
clickhouse-operator copied to clipboard
<Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below)
<Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):
- Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits
, std::__1::allocator > const&) @ 0x11d0c84e in /usr/bin/clickhouse - Poco::Net::SocketImpl::peerAddress() @ 0x11d0eac6 in /usr/bin/clickhouse
- DB::ReadBufferFromPocoSocket::ReadBufferFromPocoSocket(Poco::Net::Socket&, unsigned long) @ 0xe8de1ad in /usr/bin/clickhouse
- DB::TCPHandler::runImpl() @ 0xf64c6be in /usr/bin/clickhouse
- DB::TCPHandler::run() @ 0xf65f879 in /usr/bin/clickhouse
- Poco::Net::TCPServerConnection::start() @ 0x11d138af in /usr/bin/clickhouse
- Poco::Net::TCPServerDispatcher::run() @ 0x11d152c1 in /usr/bin/clickhouse
- Poco::PooledThread::run() @ 0x11e4b9e9 in /usr/bin/clickhouse
- Poco::ThreadImpl::runnableEntry(void*) @ 0x11e4784a in /usr/bin/clickhouse
- start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
- clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so (version 21.3.12.2 (official build))
What's wrong? Clickhouse server print this error infomation every second, Can any body help me?
did you deploy clickhouse sever inside kubernetes with clickhouse-operator?
could you share your kind: ClickhouseInstallation
manifest?
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "ch1"
spec:
defaults:
templates:
podTemplate: default
templates:
podTemplates:
- name: default
spec:
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:21.3
volumeMounts:
- name: data-storage-vc-template
mountPath: /var/lib/clickhouse
- name: log-storage-vc-template
mountPath: /var/log/clickhouse-server
volumeClaimTemplates:
- name: data-storage-vc-template
spec:
accessModes:
- ReadWriteOnce
storageClassName: alicloud-nas-subpath
resources:
requests:
storage: 400Gi
- name: log-storage-vc-template
spec:
accessModes:
- ReadWriteOnce
storageClassName: alicloud-nas-subpath
resources:
requests:
storage: 10Gi
configuration:
clusters:
- name: "c1"
layout:
shardsCount: 1
@Slach
Yes, I did deploy clickhouse sever inside kubernetes with clickhouse-operator.
I also encountered this problem, did you find out the cause of this problem?
@czhfe @intfish123 it is kubernetes inside GCP or other cloud provider?
look like healthcheck probe from kubernetes external implementation for service type: LoadBalancer
they just try TCP connect and close connect unexpectly
@czhfe @intfish123 it is kubernetes inside GCP or other cloud provider?
look like healthcheck probe from kubernetes external implementation for service
type: LoadBalancer
they just try TCP connect and close connect unexpectly
It is indeed a health status detection problem
I'm using Huawei Cloud, and Huawei Cloud's load balancer has health status detection on by default, so I don't have this problem if I turn off health status detection on my side.
Please tell me where I can see this is the problem.
We've faced we the same issue on GKE with the operator but we have no loader balancer service, only cluster IP service.
But our log is little bit defferent:
2021.10.05 12:42:35.258721 [ 251 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):
0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x13b6c5ee in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x13b6e836 in /usr/bin/clickhouse
2. DB::HTTPServerRequest::HTTPServerRequest(std::__1::shared_ptr<DB::Context const>, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x1100708b in /usr/bin/clickhouse
3. DB::HTTPServerConnection::run() @ 0x11005eee in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x13b738af in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x13b7533a in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x13ca81b9 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x13ca444a in /usr/bin/clickhouse
8. start_thread @ 0x9609 in /lib/libpthread.so.0
9. __clone @ 0x122293 in /lib/libc.so.6
(version 21.8.8.29 (official build))
does anybody know how to solve this issue?
@nsakovich @ossinkine
issue is not related to clickhouse-operator
or clickhouse-server
itself
issue related to internal Kubernetes machinery (maybe it service mech control plane health check, maybe it external load balancer health probes)
which create TCP connection to HTTP clickhouse port and abnormally close it
you could run apt-get update && apt-get install tcpdump && tcpdump -i any -w clickhouse.pcap port 8123
watch to /var/log/clickhouse-server/clickhouse-server.err.log
found Code: 1000 error time
and look to clickhouse.pcap under wireshark and try to detect what's going on on TCP level and which ip responsible to SYN -> SYN ACK -> ACK -> FIN tcp packets sequence without HTTP request