cube
cube copied to clipboard
Kubernetes Chart Questions
Problem
A couple of questions with the kubernetes chart:
- according to https://cube.dev/docs/caching/running-in-production docs for cubestore
[CUBESTORE_WORKERS](https://cube.dev/docs/reference/environment-variables#cubestore-workers) and [CUBESTORE_META_ADDR](https://cube.dev/docs/reference/environment-variables#cubestore-meta-addr) variables should be set with stable addresses
since that is the case, would it not make more sense to just point the environment variables to the corresponding service rather than using headless services, etc.
Since if you want to horizontally scale cubestore workers, you do not need to recreate every single worker, since in the current scheme if you want to add a worker, you need to change the environment variable for all other workers to also include the new worker i.e. causing it to recreate all previous workers. Please let me know if there is a specific reason as to why this was not done
- Why are cubestore workers set as statefulsets?
- are there any other reasons apart from convenience reasons i.e. it auto-creates persistent volumes for each worker
- Do you really need persistence volumes?
- my understanding is that the cubestore workers utilises a scratch space for its local
CUBESTORE_DATA_DIRwhereas theCUBESTORE_REMOTE_DIRshould be handled not natively to the pod anyways e.g. either a NFS / Blob storage. - what is the relevance of persistent volumes re: cubestore workers, and is there an underlying logical reason as to why persistence volumes are preferred over plain ephemeral storage?
Thank you!
@jxperf Could you please point out the place in docs where it advises to use statefulsets? Generally speaking, you're right, and you can just use deployments with scratch space. Adding a worker requires recreating a cluster as it triggers repartition. Each worker owns a chunk of partitions based on stateless hash partitioning.
@paveltiunov here is the example manifests which utilises a statefulset for cubestore workers/routers https://github.com/cube-js/cube/blob/master/examples/kubernetes/cluster/cubestore-workers-statefulset.yaml
rightt, if that is the case are there repercussions of pointing the CUBESTORE_WORKERS address to a kubernetes service which employs round robin load balancing on the cubestore worker pods?
when I read this in the documentation, I personally thought it was just ok to put a service in front of the cubestore workers that and assumed that we can add an additional worker with no repercussions. Maybe it might be good to include somewhere in the documentation noting the stateless hash partitioning and the need for a recreate of all cubestore workers because of the repartitioning functionality
@jxperf You can put a service in front of the worker, but at every point in time, only one worker instance should be running behind the service to avoid any data consistency issues.
Hi @paveltiunov, what exactly do you mean by data consistency issues, can you give an example of a scenario in which data inconsistency can occur?
Which cubestore mechanism causes this data consistency? is it because the cubestore router needs to know exactly how which worker exists at all time in order to correctly route requests to the correct partition?
In the scenario of there being service with multiple workers behind it, in what form does the data inconsistency manifest into, i.e. will it not be able to route to the correct worker properly, will it just return a random and wrong partition? etc.
My apologies for resurecting an old-ish thread, but I too am curious as to whether we can use Deployments instead of statefulsets for router and workers - Many Helm charts/examples out there use statefulsets (I came across a couple that don't), but it's not clear to me if it's a case of "that's the way it's always been done" or if there is a solid reason why we can't use deploymnents.
Thanks for your kind help and clarification!
@eric-pierre I feel like you might find a wider audience if you ask this on https://slack.cube.dev in the #self-hosting channel. Also, maybe inviting @lvauvillier and @OpstimizeIcarus, authors of Cube-related Helm charts, might also help.