kubectl-operator
kubectl-operator copied to clipboard
Define strategy for migration from OLMv0 to OLMv1
As of #85, we have begun to add support for the OLMv1 APIs into this plugin via a new subcommand kubectl operator olmv1. So the current state of the world is:
- OLMv1 stuff at
kubectl operator olmv1 - OLMv0 stuff at
kubectl operator *(exceptolmv1)
Long term, I think the goal is:
- OLMv1 stuff at
kubectl operator - OLMv0 stuff no longer present.
How do we get from A to B?
Proposal 1
In #84, I proposed the use of an environment variable (I chose EXPERIMENTAL_USE_OLMV1_APIS, but we could pick something else), which would act as a switch controlling the entirety of the kubectl operator functionality:
EXPERIMENTAL_USE_OLMV1_APIS=on:kubectl operatoris OLMv1- Anything else:
kubectl operatoris OLMv0
My thought there was that we could perform the migration like this:
- T1. OLMv0 support only
- T2. Default: OLMv0, require EXPERIMENTAL_USE_OLMV1_APIS=on for OLMv1
- T3. Default: OLMv1, require EXPERIMENTAL_USE_OLMV1_APIS=off for OLMv0
- T4. OLMv1 support only
In this proposed model, OLMv1 users can essentially opt-in early (at T2) and then not have to worry about subcommand changes later. Also in this model, OLMv0 users can explicitly opt-out early to avoid a breakage at T3. This gives both sets of users more time to adjust to incoming changes.
Proposal 2
Another option is to perform this same transition, but using subcommand renaming:
- T1. OLMv0 support only at
kubectl operator - T2. OLMv0 support at
kubectl operator, OLMv1 support atkubectl operator olmv1 - T3. OLMv1 support at
kubectl operator, OLMv0 support atkubectl operator olmv0 - T4. OLMv1 support only at
kubectl operator
Proposal 3
Another option is a variation on Proposal 2, except we skip T3 and do a hard cutover:
- T1. OLMv0 support only at
kubectl operator - T2. OLMv0 support at
kubectl operator, OLMv1 support atkubectl operator olmv1 - T3. OLMv1 support only at
kubectl operator
My two cents
I had always imagined this working like proposal 1, which is similar to the way that Go modules were introduced in Go. IMO, the envvar approach keeps the CLI clean, makes for less typing for OLMv1 users, and will make the transition easy to implement.
In proposal 2 and 3, we'll definitely break early users of the OLMv1 functionality at some point when we remove the olmv1 subcommand and move that functionality under the main subcommand. And I feel like it would be nice if we could help them avoid that.
This plugin is still v0 (and may always be v0), so any of these proposals is fine from a semver standpoint. I'm mainly interested in choosing a strategy that primarily minimizes user impact and secondarily minimizes developer impact.
I view environment variables as somewhat dangerous, because the exact same command may have different and unexpected behavior. This makes it harder to debug when issues arise. Having the commands explicit on the command line (even though I know environment variables can be set on the same command, they can also be set in the environment, and not part of the command).
EXPERIMENTAL_USE_OLMV1_APIS=on is longer than olmv1.
If you put the variable in your environment, everything works with OLMv1, until you move to another shell without that variable set.
Using key words (proposals 2 and 3), at least lets you know exactly what command is being run, without having to know the environment.
Would a flag be a potential compromise between the different proposals? You could introduce a boolean flag like kubectl operator --olmv1={true|false}
The migration could look like:
- T1: OLMv0 support only at
kubectl operator - T2: OLMv0 support at
kubectl operator, OLMv1 support atkubectl operator --olmv1(i.e--olmv1=falseby default) - T3: OLMv1 support at
kubectl operator, OLMv0 support atkubectl operator --olmv1=false(i.e--olmv1=trueby default) & flag is deprecated with a message saying OLMv0 support is going away soon - T4: OLMv1 support only at
kubectl operator,--olmv1flag removed
@everettraven I had originally thought of flags (e.g. --format=olmv1) but it makes the help a bit convoluted, since only certain commands (e.g. install and uninstall) would support them. But it certainly is a possibility.
A possible knock against environment variables: command completion.
All good points!
A possible knock against environment variables: command completion.
You can still get OLMv1 command completion (you just have to set the envvar when the completion is generated). So yeah, it is an extra hoop to jump through, and maybe can't easily handle dynamic completion for OLMv0 and OLMv1 simultaneously.
But I kinda imagine that there won't be too many people that care about both modes at the same time.
And @everettraven re the flag idea: +1 to what @tmshort said about that making the help pretty wonky. There are a bunch of flags in the OLMv0 version of the commands that won't make sense in the OLMv1 version of the commands. So I think it's important that whatever we decide keeps those separate.
I think @tmshort makes some strong points in regard to the concerns about how users may not be aware of the environment or that it might change depending on where/how they run the command.
Right now, I think we are kind of assuming that you have either OLMv0 OR OLMv1, but not both, installed on a cluster. Under that assumption, we could do some sanity checks in a PreRun to make sure the cluster actually has the APIs that are about to be used, and if they don't then print out a message that hints at using the other mode. That might be something we want to do regardless of the approach we take though.
On the other hand, when we get to the point of figuring out the larger OLMv0 to OLMv1 migration, it may be an invalid assumption that a cluster has exclusively OLMv0 or OLMv1.
I feel proposal 1 is the cleanest approach as it has good CLI UX and clear expectation setting. Trying to manage v1 and v0 based operator deployments with the plugin is less likely to be something that we'll see demand for. And even in this case, switching the behavior temporarily feels like a good and conscious tradeoff to me compared to trying to auto-determine what APIs were used on the cluster to carry out the operations previously or pollute the CLI switch structure.
I am going to suggest another option. Use completely different keywords for OLMv1 vs. OLMv0.
So while we have for OLMv0:
kubectl operator install and kubectl operator uninstall.
So, perhaps OLMv1 has:
kubectl operator deploy and kubectl operator undeploy or something along those lines that is different from install and uninstall.
This eliminates the need to do any migration.
Or even create and delete, which are kubectl keywords.
create/delete is intriguing. I think my only qualm is that it seems like it would mean all the following would all be present in the help at the same time:
operator createoperator deleteoperator installoperator uninstall
Which would be pretty confusing in the interim when both sets of commands exist. But perhaps this new cobra "command groups" feature could help with that: https://github.com/spf13/cobra/pull/1003
We'll have to think about what we'd do with the OLMv0 upgrade subcommand as well.