website
website copied to clipboard
Improve Upgrade Guide
The upgrade guide has a few problems:
-
It says that Vitess uses semver. We are moving to something that is more correctly described as monotonic versioning, we should just remove this reference and say upgrading from the previous version is always safe.
-
The rolling updates text refers to YouTube in June 2016. I'm not sure if it's correct to say this either, need to check the state of kubernetes and write something more accurate.
-
It is incomplete. It should mention MySQL/OS version upgrades as well. This could be made correct by s/vttablet/tablet/ in upgrade order, but I think we could make it clearer.
-
It could explain what expected errors can be seen with canaries. For me it is largely query compatibility, and memory leaks. For MySQL it could be regressions to query plans or behavior differences, particularly between major versions.
@enisoc @deepthi can you comment on any other inaccuracies?
For the canary process, I would include testing of routine operations like - backups, planned reparent, shard splits/merges etc.
Thanks @ameetkotian , good advice! I can see backups breaking from different versions etc.
Planned reparents are risky with different versioned components. Perhaps if we don't support it in the canary, we should make it clear what the risks are of extended canaries like this. You will want to follow through with upgrades to prevent certain problems.
It used to be that we only ran mysql_upgrade upon restoring from backup. As far as I can tell from a quick code check, this appears to still be true.
When upgrading to a new MySQL version, we would rely on the fact that we (YouTube) ran tablets on ephemeral disks, which meant that the replacement tablet with the new version would always come up on an empty disk and restore from backup (thus triggering mysql_upgrade).
@morgo is mysql_upgrade even still necessary? If so, we may need to document this caveat in the upgrade guide until we figure out how to make sure we run it even if the disk is retained across binary upgrades.
As for what to look for on canaries, we mainly looked at healthchecks and metrics (error rate, changes in throughput, changes in cpu/mem usage, etc).
@enisoc mysql_upgrade isn't necessary in 8.0, but it still needs to be run for prior versions and was a pain when we were running 5.7.
@derekperkins Have you thought about what an ideal solution would be for 5.7? Should we just run mysql_upgrade on every startup? Or try to detect if the version has changed and only run if necessary?
@morgo is obviously the right person to answer that question, but my thought was just to run it every time. It doesn't take long and should be safe and idempotent.
@morgo is obviously the right person to answer that question, but my thought was just to run it every time. It doesn't take long and should be safe and idempotent.
Yep, we can run it every time. It can take a long time between certain releases if tables need upgrading. There might be some details to work out.
I would be loathe to make running mysql_upgrade a default option, since it would violate the principle of least surprise.
@aquarapid if you don't run it, then MySQL will just crash loop, which I contend is much worse and more surprising than auto upgrading.
I'm willing to lean on the MySQL team's choice to have mysqld run it by default and backport it for our users.
I didn't mean to prevent MySQL 8.x from running it's upgrade now that it is internal, only in the 5.x cases where you need to launch it separately (and intentionally). I would not want (for example) the forced timestamp format upgrade from 5.6 -> 5.7 to hit me unawares. It's probably fine to make an upgrade option (like --auto-upgrade) the default in the k8s operator; but I would still not make it default tablet behavior.
I didn't mean to prevent MySQL 8.x from running it's upgrade now that it is internal, only in the 5.x cases where you need to launch it separately (and intentionally). I would not want (for example) the forced timestamp format upgrade from 5.6 -> 5.7 to hit me unawares. It's probably fine to make an upgrade option (like --auto-upgrade) the default in the k8s operator; but I would still not make it default tablet behavior.
That was my comment about there might be some details to work out if tables need upgrading. There is an option to avoid temporal upgrades, which was designed for this purpose. But if users follow our recommendations and use 250G shards, the risk is at least capped.