operator-controller
operator-controller copied to clipboard
[Feature] Rollout Canaries across the OLM fleet
Problem The ACS operator runs on over 1,500 clusters, most of them use automatic upgrades. As soon as we publish the ACS operator to the OpenShift catalog all clusters try to upgrade immediately. A failure in the upgrade process results in a lot of support tickets and high impact on customer environments as all clusters are affected.
Solution It would be great to have control over the rollout. Preferably we can configure in which order clusters upgrade automatically, in case of a rollout failure we can halt a rollout and fix the issue first.
Alternatively, an "upgrade available" endpoint to allow upgrades would be helpful. This endpoint is exposed on the operator, OLM queries the endpoint, if the endpoint is ready OLM upgrades the operator. To implement canaries the operator would connect to an ACS/Red Hat server and we implement the rollout process to our needs. See related issue on upgrades endpoint: https://github.com/operator-framework/helm-operator-plugins/issues/232