Upgrade Job Documentation
Hi!
I'm wondering how one would use the upgrade-job in production. is there any documentation for how one would use this? It seems like upgrades should be handled by creating new images and rolling out new pods, just not sure how the job template helps with that....
upgrades should be handled by creating new images and rolling out new pods
Agreed, would be really nice if we could do it this way.
I'm not sure how relevant it is to Tableau on Kubernetes, but there is some documentation on how to upgrade Tableau on linux: https://help.tableau.com/current/server-linux/en-us/sug_plan.htm
Ah, I found some more useful documentation. It's not under a explicit title in the left-hand sidebar so it's hard to find. You have to scroll down a lot in https://help.tableau.com/current/server-linux/en-us/server-in-container_image.htm to find it. Search for "Upgrading Tableau Server Versions" and there is a guide that explains how to upgrade Tableau with docker. Kubernetes would presumably be a similar process.
I just went through the guide.
- Create an upgrade-image using the build-upgrade-image script. The new version's tableau server rpm is needed to build this container.
I saw an error in the building of the upgrade container. I contacted support about it about a month ago, no response, which is typical. I just went ahead and used the upgrade container anyways, seemed to work.
- Shutdown the container currently running the Tableau Server.
- Start the upgrade-image, mounting the same data directory from the container shutdown in the previous step.
- The upgrade process takes a while, but the tableau server will be upgraded, check docker logs for upgrade process update. The container will shut down after the upgrade process.
This part went smoothly. I didn't time the upgrade but it took somewhere between a few minutes and an hour.
Start a new Tableau Server in a Container of newer version. Mount the same directory from the previous steps.
I got a startup failure. I had to run chmod 0600 /var/opt/tableau/tableau_server/data/tabsvc/config/hyper_0.20221.22.0823.1450/hyperSecurity/hyper.root.key. For some reason the file defaults to 0660 permissions, which causes an error. This issue has already been reported to Tableau, I'm waiting for them to fix it. After I fixed the file permissions the new version started up successfully.
In summary, the current process is a somewhat buggy and could be a lot easier, but it does work.
Hey @rockpunk Are u able to put tableau on k8s using the templates they have given?
Thank you for all the links and tips @caleb15.
I tried the upgrade procedure using an upgrade image, but in my setup that doesn't work. I'm running 3 nodes, 1 primary and 2 workers. When I run the upgrade image I need to stop the primary node, and then mount the disk of that node in the upgrade job pod. However, the worker nodes won't be able to connect to the upgrade image pod because that pod won't be resolvable using the name hostname as the primary node. I can set the same hostname as the primary node has, for the process running in the upgrade job pod, but that doesn't mean the workers can reach the upgrade image pod processes using that name. Since the workers can't reach the upgrade image pod, the cluster isn't fully functional and the upgrade process never actually happens.
I'll try to upgrade using the backup restore method, also mentioned in the docs linked in this thread.
Sorry, necroposting a bit for posterity, but no, I never got Tableau working on k8s in production. It was too much of a pain, especially given that Tableau as a massive monolith is near unmaintainable. Not only does the setup script generate a huge, GB sized pod, but it takes like 40 minutes for all the services just to start on that single, massive pod.
Our tableau rep mentioned they were working internally on splitting their myriad microservices out to a native k8s deploy way back when, but I got the vibe that it would never happen. Given that it's been two years and the official docs haven't changed on the subject, I wouldn't hold my breath.
We ended up running on overprovisioned ec2 boxes instead. Though the archaic enterprise software still takes way too long to start up when you do any maintenance that requires a service restart. Good luck!