clickhouse-operator
clickhouse-operator copied to clipboard
Is there a Project Roadmap?
Cool project! 🎉
My team is deeply invested in Clickhouse. We are already running it in production and we want to build a new cluster in Kubernetes. This project looks promising because I tried and it is simple. We are planning to use something like this operator in production. What we would like to know if there are any existing issues and maybe future features or enhancements.
We are happy to hear you are using ClickHouse and interersted in our operator. It is early devlopment version yet, but should be ready for use in a couple of months. We do have an internal roadmap. It is not public, but we can publish it. What features you might be interested in?
The latest version has a bug related to replication configuration and macros. We will push a fix on Monday or Tuesday.
If it's not that much of a concern, it would be really cool if you can make that public for interested developers like us. I myself is just starting with Clickhouse, and this operator made me understand "how a replicated merge tree" should look like in production.
These are some cool features that add more automated management in the lifecycle of a cluster:
- backup to external storage
- data recovery sounds like a really good business case.
In addition to previous post, it would be very cool to have cluster maintenance tasks automated such as: -updating CH version on each node -update config (config.d, users.d, etc) files on each node, possibly with substitution parameters (like $IP replaced with node's IP address). This would allow to reconfigure existing clusters (like add or remove new nodes) and change CH parameters globally to enable/disable certain features. -run maintenance SQL scripts on each node, for instance, some scripts which cannot use CH [ON CLUSTER cluster] feature.
Thank you for the feedback. Are you using operator already?
Answering your questions:
-updating CH version on each node
There is an example how it can be done, if you need to do it with maximum transparency. https://github.com/Altinity/clickhouse-operator/blob/master/docs/chi_update_clickhouse_version.md
If you do not care much, just change the clickhouse-server version in template and apply.
-update config (config.d, users.d, etc) files on each node, possibly with substitution parameters (like $IP replaced with node's IP address). This would allow to reconfigure existing clusters (like add or remove new nodes) and change CH parameters globally to enable/disable certain features.
Once you change the configuration and apply it -- it is applied to all installation nodes.
-run maintenance SQL scripts on each node, for instance, some scripts which cannot use CH [ON CLUSTER cluster] feature.
Could you provide an example of such script/tasks?
In fact, we are currently working on a document that covers typical operations and troubleshooting.
Hello,
No, we don't use it yet, I've just recently found it and trying to understand what benefits it can provide.
Currently, we use Azure ARM scripts to setup a CH cluster of given size in Azure, it uses bash scripts and substitution parameters like $IP to add node-specific data (like IP address) to node's config files (so it is a single ARM script and few config files with parameters to create whole cluster). All nodes are behind a load balancer which connects a client to a random node and also allows to connect to each specific node directly by using additional port. It could be good to switch from that to Kubernetes, it could make it easier to be cloud-agnostic if we want to switch from Azure to AWS, for instance.
As for example of SQL maintenance scripts, we thought of something like: -manually re-balancing existing shard data if new nodes are added to a cluster -running some custom CH script in case some custom replica repair/troubleshooting/research is needed -other custom scripts which should bypass load balancer and run on particular subset of nodes. But anyway, I guess we can use clickhouse-copier and Remote table function, it might be enough to solve tasks mentioned above. Another question is about HTTPS for client connections, is it supported by operator?
Thanks!
Oleg
Hi Oleg,
Absolutely, moving to k8s makes you cloud agnostic. That was one of our main motivators to develop operator. We do have our tools to manage ClickHouse in Amazon (using Terraform) and it could be ported to other clouds, but k8s solves portability problems in a general way.
Speaking of your points.
- Re-balancing is a good example. We planned to add it to operator but later decided that it is better to be implemented in ClickHouse itself. Stay tuned.
- Replica recovery/troubleshooting is typically very manual operation. We have a lot of experience dealing with customer issues, and very cautious automating those.
- Operator automates many routine operations like upgrades, configuration deployments, node type change and so on.
- Yes, we can configure HTTPs. It is done via proper configuration of ClickHouse as well as k8s service annotations, if necessary. I was looking for an example, but we do not have any. Will be added.
Thanks, Alexander
Hi guys,
Thank you for your operator, it's work really good.
For backup it's possible to add a side container with the backup project. I think it will be good to add example documentation for this. May be I will push a pull request later.
I have one question with resharding. You have say :
Re-balancing is a good example. We planned to add it to operator but later decided that it is better to be implemented in ClickHouse itself. Stay tuned
.
Can you confirm that today resharding (scaling shard and rebalance datas on new shards) can be only done with Clickhouse copier at this time ?
No plan to do this process more simple ? (On your project or on clickhouse project)