clickhouse-operator
clickhouse-operator copied to clipboard
DNS resolution failure over TCP for ClickHouse in restricted UDP environment
Description:
In our environment, DNS resolution over UDP is blocked, so we've configured pods to use TCP for DNS resolution instead. Testing with ping confirms that DNS resolution over TCP works, as the service name resolves successfully. However, ClickHouse is unable to resolve the service name over TCP and returns an error.
Steps to Reproduce:
- Block UDP DNS resolution in the environment.
- Configure pods to use TCP for DNS resolution.
dnsConfig:
options:
- name: use-vc
- Run ping to verify TCP DNS resolution, which works as expected.
ping chi-test-test-1-2.default.svc.cluster.local - Attempt to start or use ClickHouse with the above DNS configuration.
Observed Behavior:
ClickHouse fails to resolve the service name over TCP, generating the following error:
2024.11.14 18:20:17.660787 [ 48 ] {c1e33f52-b6b1-45e6-b1e0-c24514136aa9} <Error> DNSResolver: Cannot resolve host (chi-test-test-1-2.default.svc.cluster.local), error 0: Host not found
However, running ping within the pod resolves the service name as expected:
PING chi-test-test-1-2.default.svc.cluster.local (10.42.0.123) 56(84) bytes of data.
64 bytes from chi-test-test-1-2-0.chi-test-test-1-2.default.svc.cluster.local (10.42.0.123): icmp_seq=1 ttl=64 time=0.038 ms
Expected Behavior:
ClickHouse should be able to resolve service names over TCP in environments where UDP DNS is blocked, similar to the successful resolution observed with ping.
Additional Context:
Are there any known limitations with ClickHouse’s DNS resolver over TCP? Any recommendations or configurations to resolve this issue would be helpful.
issue, is not related to clickhouse-operator, but i'm not sure will standard golang library which we use in clickhouse-operator also follow use-vc and use DNS over TCP by default.
Typical use case for DNS over TCP is big UDP responses
Why did you restrict a standard DNS approach?
You're correct, but we are working in an environment within an enterprise bank where custom DNS servers (coreDNS/kube-dns) or hosts are not permitted.
This is part of a proof of concept (POC) where we aim to demonstrate our application, which utilizes ClickHouse.
@Slach Do you have any suggestions for a potential workaround
@arthurpassos could you suggest something about DNS over TCP in DNSResolver clickhouse-server?
@arthurpassos Can you suggest any possible workarounds
A setting that control the protocol could be introduced, something like dns_resolution_protocol=[any|udp|tcp].
ClickHouse uses poco lib to perform DNS reoslutions. Poco, under the hood, uses libc getaddrinfo.
getaddrinfo function takes in a addrinfo structure that has the option to set the protocol: any, udp or tcp afaik. The thing is that Poco does not have an abstraction that allows addrinfo to be manually set.
Options available:
- stop using poco and call
getaddrinfomanually. - submit a pr to poco lib introducing such api, and then update our poco fork.
- update our poco fork only.
A setting that control the protocol could be introduced, something like
dns_resolution_protocol=[any|udp|tcp].ClickHouse uses
pocolib to perform DNS reoslutions. Poco, under the hood, uses libcgetaddrinfo.
getaddrinfofunction takes in aaddrinfostructure that has the option to set the protocol: any, udp or tcp afaik. The thing is that Poco does not have an abstraction that allowsaddrinfoto be manually set.Options available:
- stop using poco and call
getaddrinfomanually.- submit a pr to poco lib introducing such api, and then update our poco fork.
- update our poco fork only.
I looked at the code again, poco lives in base/poco, no need to submit a PR to poco or update our fork. It is bundled to gether, easier.
Editing the Poco...DNS::hostByName to accept a protocol parameter is easy, tho it won't work on systems that do not have getaddrinfo.
After that, one needs to make sure all DNS function calls specify the protocol based on the setting. Not very scalable, but it is the same thing with proxy support