elasticsearch-operator
elasticsearch-operator copied to clipboard
How to upgrade Elasticsearch to a new version?
I was wondering how you would go about upgrading to a newer version of elasticsearch using this operator.
Hey @ejsmith upgrading in place is something that would need implemented. You can specify the docker image in the TPR, so there you could change the version, but that won't roll over to the other components.
Today (being not implemented yet) you could update the deployments and the statefulset, which would get you to the new version.
I changed the base image in main.go and am trying to bake a container with ES 5.3.1 and am getting errors around peer discovery. Looking at your Dockerfile for ES, I see that you're not installing the any discovery plugins, but I can't find where your providing any other mechanism in the code. Could you shed a little light on how this operator handles peer discovery?
You should be able to pass args to the container to override the default image: baseImage
The base images are build off of the work @pires has done. You can find that repo here: https://github.com/pires/docker-elasticsearch-kubernetes
My customization only adds the TLS encryption plugin to enable secure communication. I need to get upgraded to the latest and I haven't tried that. Let me see what's going on with a newer version and will get back to you.
Discovery no longer relies on plug-in(s) but rather DNS, for quite some time. I have released the container image for 5.3.1 very recently and haven't had the time to try it out properly. Feel free to open an issue in the repo @stevesloka pointed out above.
Well, I think I've resolved my issue. I was using his elasticsearch.yml with your controller so I it was looking for the fabric8 plugin. I'll continue to test and report.
@jroberts235 where exactly is my elasticsearch.yaml
is looking for the fabric8 plug-in?
sorry, my mistake. It wasn't your config, but the config I had was using the fabric8 plugin
I revive this issue as I would like to implement a global update of ES spec including the versioning and everything that requires a node restart. After the operator code review, it seems I will need to do a quite big refactor so I am proposing a solution for opinions:
- Because the ES cluster is implemented over 3 K8S objects (2 sts and 1 dep) we will need to evaluate the need for update/upgrade (vs creation) and trigger it in the processor and not in k8sutil functions anymore. Global operations like disabling shard reallocation and synced flush need to be done before launching each individual updates. It also needs to manage operator crashes that could happen in the middle of an upgrade... I propose to put this logic in the processor as it needs the global vision
- A decomposition of the statefulset creation in two phases would bring benefits: the generation of the STS object and the injection in K8S cluster (creation or update)
- I propose to divide ES specs in two categories: the specs related to K8S objects and the specs related to ES clusters. Changes in K8S specs generally don't need to trigger an ES update (with all the related process) in contrary to changes in ES spec which will trigger pods restart. The only particular case is a change in storageclass but it's a particular case that requires a backup/restore and a cluster restart. It would allow to quickly compare specs and decide if an upgrade is required
Finally the high level algorithm design would be:
- generate k8s objects
- check if it's an update with pod restart, an update without pod restart or a creation
- if creation, create k8s objects
- if update with pod restart, launch pre update ES tasks (flush, shard realloc...), use STS and deploy rolling update with iterative partition decrease and ES cluster status checks, then launch post update ES taks
- if update without pod restart, just update k8s objects
Any opinion on this ?
Hey @vgkowski! Thanks for the interest! I think I'm on board for most of what you have proposed. Can you provide any more detail for the need for 2 object (e.g. k8s & es specs)?
Also, the big thing I haven't done yet that needs done for any updgrade or changes to the cluster is implemented health checks. We need to be able to monitor the status of the cluster so we can perform reboots, upgrades, etc.
I will try to split the effort into several PR starting with the health check. Regarding the need for 2 objects, I thought it was more readable for users and clear to identify that changing ES specs would rolling restart the cluster. It would also maybe improve code readability and help checking the need for rolling upgrade ?
I thought it was more readable for users and clear to identify that changing ES specs would rolling restart the cluster.
Only when necessary. Rather have manual updates now (including moving data between clusters) than full cluster reboots during upgrades - it's a dangerous process.