pulumi-kubernetes
pulumi-kubernetes copied to clipboard
helm.sh/v3:Release, unclear how to recover from "another operation (install/upgrade/rollback) is in progress"
What happened?
After deploying the Release initially everything was fine, but when the chart failed to deploy because of a configuration issue it reported:
~ kubernetes:helm.sh/v3:Release airflow **updating failed** [diff: ~values]; error: another operation (install/upgrade/rollback) is in progress
I've been unable to figure out how to recover from this, and was forced to pulumi destroy the stack & re-create it. A pulumi cancel didn't seem to have any affect.
Steps to reproduce
- Deploy a stack to a Kubernetes cluster like this: https://gist.github.com/shousper/3d0a1cb83ad235276f59f8a95d4a4bba
- On first deploy, it'll hang because it is (purposefully) misconfigured. Pulumi will wait forever, you'll see a
Jobcalledfail-airflow-run-airflow-migrationsthat will keep failing. - Cancel the deployment.
- Run
pulumi cancelif you like, it won't matter. - Run
pulumi upthe preview will "look okay" but it'll fail when to attempt to apply the changes.
Expected Behavior
Should restore regular function upon running pulumi cancel and/or provide better instruction as to what action is required to resolve the stuck "operation in progress".
Actual Behavior
Stack becomes inoperable and must be destroyed.
Versions used
CLI
Version 3.33.2
Go Version go1.17.10
Go Compiler gc
Plugins
NAME VERSION
aws 5.6.0
docker 3.2.0
kubernetes 3.19.3
nodejs unknown
random 4.8.0
Host
OS darwin
Version 12.4
Arch x86_64
This project is written in nodejs (/Users/cmcgregor/.asdf/shims/node v14.19.2)
Current Stack: staging-data-airflow-dags
TYPE URN
pulumi:pulumi:Stack urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:pulumi:Stack::data-airflow-dags-staging-data-airflow-dags
pulumi:providers:aws urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:providers:aws::default_5_6_0
pulumi:providers:pulumi urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:providers:pulumi::default
aws:cloudwatch/logGroup:LogGroup urn:pulumi:staging-data-airflow-dags::data-airflow-dags::aws:cloudwatch/logGroup:LogGroup::workers
pulumi:providers:kubernetes urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:providers:kubernetes::default_3_19_3
pulumi:providers:random urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:providers:random::default_4_8_0
kubernetes:core/v1:Namespace urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace::airflow
random:index/randomPassword:RandomPassword urn:pulumi:staging-data-airflow-dags::data-airflow-dags::random:index/randomPassword:RandomPassword::default-user-password
random:index/randomString:RandomString urn:pulumi:staging-data-airflow-dags::data-airflow-dags::random:index/randomString:RandomString::webserver-secret-key
aws:ecr/repository:Repository urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$aws:ecr/repository:Repository::airflow
kubernetes:monitoring.coreos.com/v1:ServiceMonitor urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:monitoring.coreos.com/v1:ServiceMonitor::airflow
kubernetes:core/v1:Secret urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::dags-ssh-key
kubernetes:core/v1:Secret urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::oauth
awsx:ecr:Repository urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$awsx:ecr:Repository::airflow
kubernetes:core/v1:Secret urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::webserver
pulumi:pulumi:StackReference urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:pulumi:StackReference::shared-database
aws:ecr/lifecyclePolicy:LifecyclePolicy urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$awsx:ecr:Repository$aws:ecr/lifecyclePolicy:LifecyclePolicy::airflow
pulumi:pulumi:StackReference urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:pulumi:StackReference::infra
kubernetes:core/v1:Secret urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::data-metadata-connection
kubernetes:core/v1:Secret urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::connections
aws:iam/role:Role urn:pulumi:staging-data-airflow-dags::data-airflow-dags::aws:iam/role:Role::worker
kubernetes:core/v1:ServiceAccount urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:ServiceAccount::airflow-worker
kubernetes:helm.sh/v3:Release urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:helm.sh/v3:Release::airflow
Found no pending operations associated with staging-data-airflow-dags
Backend
Name maelstrom.local
URL s3://<redacted>
User cmcgregor
Organizations
NAME VERSION
import-sort-parser-typescript 6.0.0
import-sort-style-module-scoped 1.0.3
@pulumi/kubernetes 3.19.3
@pulumi/pulumi 3.34.1
@types/js-yaml 4.0.5
@typescript-eslint/eslint-plugin 5.28.0
@typescript-eslint/parser 5.28.0
eslint-plugin-unused-imports 2.0.0
eslint-plugin-prettier 4.0.0
import-sort 6.0.0
typescript 4.7.3
@types/node 14.18.21
eslint 8.17.0
eslint-config-prettier 8.5.0
js-yaml 4.1.0
prettier-plugin-import-sort 0.0.7
@pulumi/aws 5.6.0
@pulumi/random 4.8.0
prettier 2.7.0
Pulumi locates its logs in /var/folders/p0/zkdhxd596q31sxqykz4shh6c0000gn/T/ by default
Additional context
Originally raised in Pulumi's Community slack: https://pulumi-community.slack.com/archives/CRFURDVQB/p1655436629559579
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
@viveklak Any idea if there's a workaround for this?
@shousper thanks for filing the issue. By default Helm release installs in wait mode - i.e. it waits for the underlying chart resources to be installed. You can skip this by setting the https://www.pulumi.com/registry/packages/kubernetes/api-docs/helm/v3/release/#skipawait_nodejs flag. It seems there is a class of issues with Helm itself where if a blocking update/install is interrupted, it might leave the release in an inconsistent state. You may have to consider a workaround like described here to recover: https://github.com/helm/helm/issues/4558#issuecomment-648352068
@viveklak Thanks! So just using the helm CLI to perform the rollback? No worries, I'll give that a go and report back if there are any problems ✌🏻
So using the helm CLI to uninstall/rollback a release appears to unblock pulumi from taking further action 🎉 However, pulumi still reports this warning upon subsequent update, despite everything being okay:
warning: Attempting to deploy or update resources with 1 pending operations from previous deployment.
* urn:pulumi:staging-kafka-cruise-control::kafka-cruise-control::kubernetes:helm.sh/v3:Release::cruise-control-oauth-proxy, interrupted while creating
I ran a pulumi stack export and found there were still some pending_operations in the state. So I ran pulumi cancel against the stack, and exported it again, but the pending operations were still there. Perhaps just a gap in the way pulumi cancel is handled for some resources?
I'm seeing this frequently when the host/pulumi process dies. The way I've been resetting it is to just delete the helm release secret via k9s. I think pulumi should add an option to rollback existing installs that may be stuck prior to installing new helm releases.