clickhouse-go
clickhouse-go copied to clipboard
Better load balancing for server connections
As I see in the current V2 code, every time before there is an operation, this library runs "ch.acquire" to get an active clickhouse-server. I believe we can gain some performance in latency if we move this process to outside of the queries.
So basically a dedicated goroutine, first adds all servers into a pool, and then monitors the every endpoints and if there is an issue with a server, so it can remove from the pool.
In this case, ch.acquire only gets one of the active servers from the pool and executes the operation to that machine. We are doing a similar operation in our load balanced API gateway and it handles thousands of requests per seconds.
Are you proposing changing how the pool works or allowing a specific connection to be used? Selecting a connection from the idle pool should be cheap - you can always increase the size of this in heavy workloads. Connections are only created if the idle pool is empty.
How would you prefer? fixed pool size?
I'm proposing changing how the pool works actually. But it's good idea to give people to use a specific group of Clickhouse machines by implementing some kind of workgroup or priority based model maybe?
Since Clickhouse Cloud is serverless solution at the moment, maybe it's not a big thing for CH customers. But for on-prem users, they might want to dedicate some of the Clickhouse servers just for read purposes. Our another scenario might be having 2 different regions of group of Clickhouse servers. And trying first region of servers and if there is no available server in that region, redirecting the request to other region.
I'm just brainstorming. Maybe these are too much work for this driver. But load balancing might work as an external plugin (like ch-go) and this plugin would only supply server for the request. Users can implement their own load balancing or contribute to other one directly. Still just brain storming.. :)
Thanks.
There was a request to expose how connections are selected from the pool. Maybe this would solve your requirement? i.e. allow the acquire function to be custom?
@gingerwizard I want allow the acquire function to be custom
.
My project clickhouse_sinker make a connection pool which is shard-aware. If writing fail with retryable error(such as ZooKeeper session expire), the user retry another replica inside the same shard. And there's per-replica connections limit.
Retrying inside the same shard helps to leverage ReplicatedMergeTree
deduplication feature.
Agree we need to do this @yuzhichang