pulumi icon indicating copy to clipboard operation
pulumi copied to clipboard

pulumi destroy --excluded-protected is failing with an error `grpc: the client connection is closing` when kubernetes.Provider is in stack

Open jancespivo opened this issue 2 years ago • 9 comments

What happened?

pulumi destroy --excluded-protected is failing with an error grpc: the client connection is closing when kubernetes.Provider is in stack. The bug was introduced in Pulumi version 3.74.0

Example

import pulumi
import pulumi_gcp as gcp
import pulumi_google_native.container.v1 as container
import pulumi_kubernetes as kubernetes

compute_service = gcp.projects.Service(
    "compute.googleapis.com",
    service="compute.googleapis.com",
)

container_service = gcp.projects.Service(
    "container.googleapis.com",
    service="container.googleapis.com",
    opts=pulumi.ResourceOptions(
        depends_on=[compute_service],
    )
)

cluster = container.Cluster(
    "cluster",
    node_pools=[
        container.NodePoolArgs(
            name="default",
            initial_node_count=1,
            config=container.NodeConfigArgs(
                machine_type="e2-custom-4-8192",
                disk_type='pd-balanced',
            ),
            locations=[
                "europe-west1-b",
            ]
        )
    ],
    initial_cluster_version="1.25",
    location="europe-west1-b",
    opts=pulumi.ResourceOptions(
        depends_on=[container_service],
    )
)

k8s_provider = kubernetes.Provider(
    "k8s_provider",
    kubeconfig=cluster.get_kubeconfig(),
)

If I create this stack with pulumi up, pulumi destroy --exclude-protected is failing.

If I remove kubernetes.Provider resource and update stack with pulumi up then pulumi destroy --exclude-protected starts working.

Output of pulumi about

$ pulumi about CLI
Version 3.94.2 Go Version go1.21.4 Go Compiler gc

Plugins NAME VERSION gcp 7.2.1 google-native 0.31.1 kubernetes 4.5.4 python unknown

Host
OS arch Version "23.1.0" Arch x86_64

This project is written in python: executable='/home/beer/projects/tacr/platform/cluster-deploy/.venv/bin/python3' version='3.11.5'

Current Stack: organization/cluster/minimal

TYPE URN pulumi:pulumi:Stack urn:pulumi:minimal::cluster::pulumi:pulumi:Stack::cluster-minimal pulumi:providers:gcp urn:pulumi:minimal::cluster::pulumi:providers:gcp::default_7_2_1 gcp:projects/service:Service urn:pulumi:minimal::cluster::gcp:projects/service:Service::container.googleapis.com gcp:projects/service:Service urn:pulumi:minimal::cluster::gcp:projects/service:Service::compute.googleapis.com pulumi:providers:google-native urn:pulumi:minimal::cluster::pulumi:providers:google-native::default_0_31_1 google-native:container/v1:Cluster urn:pulumi:minimal::cluster::google-native:container/v1:Cluster::cluster pulumi:providers:pulumi urn:pulumi:minimal::cluster::pulumi:providers:pulumi::default pulumi:providers:kubernetes urn:pulumi:minimal::cluster::pulumi:providers:kubernetes::k8s_provider

Found no pending operations associated with minimal

Backend
Name beer-notebook URL gs://cluster-deployment--wmghdg/ User beer Organizations
Token type personal

Dependencies: NAME VERSION pip 23.2.1 pulumi-gcp 7.2.1 pulumi-google-native 0.31.1 pulumi-kubernetes 4.5.4 setuptools 65.5.0

Pulumi locates its logs in /tmp by default

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

jancespivo avatar Nov 24 '23 20:11 jancespivo

The bug was introduced in Pulumi version 3.74.0

This works on 3.73.0?

pulumi destroy --exclude-protected

Anything special about --exclude-protected here? It doesn't look like any of your resources are protected so I'd expect this to behave the same with or without that flag, does it?

Frassle avatar Nov 24 '23 21:11 Frassle

Yes and yes :) pulumi destroy without --excluded-protected works fine. The version 3.73.0 is not affected. pulumi destroy --excluded-protected stopped to work in 3.74.0 regardless anything in stack is protected or not.

jancespivo avatar Nov 24 '23 21:11 jancespivo

I can reproduce this. Running pulumi destroy --excluded-protected with pulumi v3.91.1 and earlier works. Running with v3.92.0 fails with error: rpc error: code = Canceled desc = grpc: the client connection is closing error.

The associated change in v3.92.0 is https://github.com/pulumi/pulumi/pull/14436

Previously, any errors in DiffConfig were ignored. Now they aren't.

Verbose logs confirm:

I1128 17:02:24.620334   73511 log.go:81] Provider[kubernetes, 0x14002288740].DiffConfig(urn:pulumi:dev::k8sprov::pulumi:providers:kubernetes::k8s_provider) provider received rpc error `Canceled`: `grpc: the client connection is closing`

Which corresponds with:

https://github.com/pulumi/pulumi/blob/bafa1f9a3fd7497751b97c0aae41527a7fb464a3/sdk/go/common/resource/plugin/provider_plugin.go#L456-L457

justinvp avatar Nov 29 '23 01:11 justinvp

Those version numbers make more sense. Could probably repro this in an engine test with that info.

Frassle avatar Nov 29 '23 09:11 Frassle

We are encountering the problem even with version 3.91.1 :man_shrugging:

jancespivo avatar Dec 01 '23 16:12 jancespivo

Any news about this bug? still happening in version v3.115.2

msxdan avatar May 09 '24 23:05 msxdan

I'm seeing the same symptoms with destroy --target-dependents -t, but destroying the dependents first worked, so it probably depends on what the resources are. In my case specifically the target configures another provider that is used for the dependents.

jan-hudec avatar May 13 '24 13:05 jan-hudec

After hitting it once more, this time with a plain recreate plan, I think the actual condition for triggering this is having a resource use a provider that depends on outputs of another resource that is being destroyed.

The original post does show having a pulumi_kubernetes.Provider depend on the output of a kubernetes cluster, but does not include any resource that would use that provider. But there must be some to reproduce the issue, because when I target-delete them, the problem goes away and the rest of the destruction proceeds.

jan-hudec avatar May 16 '24 12:05 jan-hudec

I have the same bug. Since it happened after I renamed some resources, I tried PULUMI_K8S_DELETE_UNREACHABLE=true pulumi up --stack dev to update my stack and remove the old resources. However, even now pulumi destroy --stack dev --exclude-protected gives me the grpc error, so I'm not sure if that can be linked to having renamed resources, but hopefully it can help someone to debug or recreate the issue.

havetc avatar Jun 25 '24 15:06 havetc

After hitting it once more, this time with a plain recreate plan, I think the actual condition for triggering this is having a resource use a provider that depends on outputs of another resource that is being destroyed.

The original post does show having a pulumi_kubernetes.Provider depend on the output of a kubernetes cluster, but does not include any resource that would use that provider. But there must be some to reproduce the issue, because when I target-delete them, the problem goes away and the rest of the destruction proceeds.

I was about to write the same. We came across the same issue, while attempting to destroy a k8s cluster and resources within. In our case the we have the k8s cluster managed with pulumi-azure-native and the kubernetes provider depending on the cluster outputs. We solved the issues by first deleting all k8s resources and the azure resources in a second step.

sebbegg avatar Oct 22 '24 11:10 sebbegg