pulumi Add --delete-before-create flag

Please add a --delete-before-create flag to pulumi up. The purpose of this is to delete resources prior to creating new ones when significant changes to the project have been made.

The example that drove me to this request is that we recently migrated away from using kubernetes manifests directly to helm charts. However, some resources have the same name on kubernetes. Pulumi though thinks they are different but unrelated, so it attempts to create the new resources through helm, which fails as the resource already exists.

While createBeforeDelete? is available, we shouldn't have to make changes to the project (index.ts) just for this, as it's a temporary requirement.

Jun 25 '19 17:06 nesl247

However, some resources have the same name on kubernetes. Pulumi though thinks they are different but unrelated, so it attempts to create the new resources through helm, which fails as the resource already exists.

A subtle problem here (if I understand your case correctly) is that this is a different form of delete before create than what you set on a resource in its options. That delete before create says: "If you have to replace this resource, delete it and then create the new one, instead of creating the new one first and then deleting the old one". In this case, from Pulumi's point of view, its the same resource transitioning from one state to another. For example, during a preview you would see the status of the resource show up as replace.

I think in your above example, things are slightly different. Here you have replaced one way of creating the resource with a different way, but you specify properties in such a way that the same underlying resource is created inside of Kubernetes. In this world, Pulumi doesn't understand the relationship between the two resources, so during a preview you see a create followed by a delete for what are logically the same resource. Is that correct?

If so - this is a little hard to support, because of how of Pulumi decides if it needs to delete a resource or not. It only makes that decision at after your program has finished executing, but that doesn't happen until we've created all the resources we need to create. So we can't really delete any resources until the end of your program, with the exception of resources that are being "replaced" and are marked as "delete before create".

The closest thing we have today to what I think you desire is aliases, which are a way to tell Pulumi "while these look like two different resources, I've actually just changed how I named them, so please treat them as the same". If there was an easy way to alias all the resources in your helm charts with the old names, that would solve your problem, I think (I also think this is quite hard today, unfortunately).

Do I understand your problem correctly? If not - could you share some code examples with me showing the old and new ways of doing something so I can see how it breaks?

Jun 26 '19 23:06 ellismg

Yes, you are describing the scenario perfectly. If pulumi is doing a preview, then it should know everything that it has to execute. Pulumi has to calculate this before it creates anything somehow doesn’t it?

Jun 27 '19 04:06 nesl247

Yes, you are describing the scenario perfectly. If pulumi is doing a preview, then it should know everything that it has to execute. Pulumi has to calculate this before it creates anything somehow doesn’t it?

The major problem here is that pulumi preview and pulumi update are basically independent from one another. Put another way, the output of pulumi preview is not tied to what a future pulumi update will do. It's not as if pulumi preview computes a set of actions to run and then pulumi update runs them. Instead, pulumi preview executes your program in a mode where resource operations don't actually happen, we just say what would happen and then pulumi update actually runs your program again in a mode where the underlying resource operations are allowed to happen.

Because of this there's no upfront computation during an update of all operations to schedule. Instead, your program runs, and as it does so, it generates calls into the engine saying: "Please ensure a resource of this type with these arguments exists with this URN". The engine then decides how to make that happen, If the URN matches an existing resource in the previous state, Pulumi knows it needs to do an update (which could take the form of a replace). If it has not seen a resource with that URN before, it knows it needs to create things. But this happens as your program is executing, not at the end (this is required, because if you take the output of a resource and flow it into another, we can't yet schedule the operation on the second resource until the first one has completed and we know the true underlying value for the property).

Then, after your program has finished executing and we know there are going to be no more event registrations, we go and delete any resources that we had in the previous state that were did not see a registration for.

The one place we actually preform any deletes before doing other resource operations is the case where a previous update tried to delete a resource but that deletion failed. We retry these deletes at the start of the next update.

It's possible that we could add some gesture like pulumi state pend-delete <URN> or something that marked a resource as needing to be deleted on the next update. The UX here may not be great, however. I have also not though 100% of the implications of doing this, especially in cases where you delete a resource that other things depended on. To get a mode what you desire, we would have to instead to teach the engine to operate in a mode where ran your program in a mode that was half like a preview (where we wouldn't do any creates or updates) but half like an update (where we did do deletes at the end), and then run your program again in the regular mode.

I'm interested in getting @pgavlin's thoughts here. During an in person discussion, he had posited that maybe if we had something like pulumi state taint <URN>, but I think that would still not do what you want here. You don't want to mark a resource as needing replacement, you really want to say: "Please, just delete this resource eagerly before the next update begins".

Jun 27 '19 22:06 ellismg

I would definitely think changing Pulumi to execute the plan that is discovered in the preview would make the most sense. While it may require a lot of work, it would make the most sense, and also make the preview more reliable. This way anyone running the preview would know the exact order in which all resources will be acted upon, and what the actions are.

It is actually a little scary that preview is really meaningless.

Jun 27 '19 22:06 nesl247

I would definitely think changing Pulumi to execute the plan that is discovered in the preview would make the most sense.

This is tracked in https://github.com/pulumi/pulumi/issues/2318.

Please add a --delete-before-create flag to pulumi up. The purpose of this is to delete resources prior to creating new ones when significant changes to the project have been made.

I've read through this whole issue a couple times, but still don't quite follow what the ask is exactly. What resources does this delete? IS this related to deleteBeforeReplace (and thus limited to things which will be replaced) or unrelated (as far as I can tell its unrelated).

It's possible that we could add some gesture like pulumi state pend-delete <URN>

It this the feature you are looking for?

Jan 15 '20 04:01 lukehoban

This deletes all resources that pulumi detects need to be deleted because of a change in the plan. For example:

Create a K8s Service named "1"
Remove directly created K8s Service named "1" in favor of it being created via helm in pulumi
Pulumi detects it needs to delete resource "1" because the code changed, but the real result is the same resource
Pulumi fails because it can't create a resource that already exists

This is why we need a --delete-before-create so that any resources that would indeed be deleted by pulumi, but only after it's created all the new resources, are actually deleted first. Currently the only real way to do this is to destroy the resources first, then create them, which requires git work (changing branches or whatever) in order to access the original resources to destroy them, then apply the new ones.

Jan 15 '20 15:01 nesl247

FWIW I'm running into the exact same issue on a number of AWS resources. Take VPC endpoints for example. If I make a change the forces a delete/create (not a replace), such as changing the resource name, if I don't destroy the resource via code and then add the 'new' resource, the deployment will fail due to a private DNS zone name collision. This isn't the only place it shows up, just the first one off the top of my head.

Feb 28 '20 18:02 solidDoWant

I think I'm running into this now when attempting to upgrade helm charts from versions that support < k8s 1.15 to >= to 1.16. For example, this chart made changes to support the deprecation of certain beta APIs: https://github.com/helm/charts/commit/c5989490cac32d4ca350c5d2f8928dfebff111b3#diff-183249a162a69eec004fba849babdebd

As a result Pulumi output looks like this:

 Type                                                           Name                                  Status                  Info
     pulumi:pulumi:Stack                                            aws-arcus-kubernetes-casey-robertson  **failed**              1 error
     ├─ kubernetes:helm.sh:Chart                                    metrics-server
 +   │  └─ kubernetes:rbac.authorization.k8s.io:ClusterRoleBinding  metrics-server:system:auth-delegator  **creating failed**     1 error


  kubernetes:rbac.authorization.k8s.io:ClusterRoleBinding (metrics-server:system:auth-delegator):
    error: resource metrics-server:system:auth-delegator was not successfully created by the Kubernetes API server : clusterrolebindings.rbac.authorization.k8s.io "metrics-server:system:auth-delegator" already exists

Jun 09 '20 20:06 casey-robertson

For example, this chart made changes to support the deprecation of certain beta APIs: helm/charts@c598949#diff-183249a162a69eec004fba849babdebd

The diff linked there does not seem to be the same resource triggering the error here (the diff is about -auth-reader whereas the error is about auth-delegator (and in fact the error is about a clusterrolebinding not a rolebinding)? If the situation outlined in that diff was what was happening, this should just work as is - as those two API Versions are aliased to be treated as the same resource and the update would just proceed as normal. But I suspect something different is happening here. It seems like perhaps the chart didn't used to include a resource, but you added that resource somehow else, but then the chart also added it. Is that possible? If so, you will need to either use a transformation to remove the resource from the chart, or remove it from wherever else you are creating it. It is not immediately clear to me that delete-before-create is at the root of this issue, though more details would be needed to identify that for sure.

Jun 10 '20 00:06 lukehoban

I gave an incomplete example and made a bad reference. Here's the relevant code in the chart and the history:

https://github.com/helm/charts/commits/master/stable/metrics-server/templates/auth-delegator-crb.yaml

Is it likely this change? https://github.com/helm/charts/commit/106969f251ae2e4d2315f660cf8600b66cbadf57#diff-f1ac9d24aa4c3553a0bd62085662bc53

Jun 10 '20 17:06 casey-robertson

I'm running into this issue now.

I have 3 existing NS records in CloudFlare.

I now want to replace these with a single CNAME record.

Previewing update (dev):
     Type                               Name                  Plan       Info
     pulumi:pulumi:Stack                web-authn-dev                    2 messages
 +   ├─ cloudflare:index:Record         web-authn-cdn-record  create     
 -   ├─ digitalocean:index:Domain       web-authn-dev         delete     
 -   ├─ digitalocean:index:Cdn          web-authn             delete     
 -   ├─ digitalocean:index:Certificate  web-authn-cert        delete     
 -   ├─ cloudflare:index:Record         web-authn-ns1-record  delete     
 -   ├─ cloudflare:index:Record         web-authn-ns3-record  delete     
 -   └─ cloudflare:index:Record         web-authn-ns2-record  delete

When I try to apply the change, Pulumi attempts to create the new CNAME record before deleting the existing NS records, and Cloudflare does not allow it:

Do you want to perform this update? yes
Updating (dev):
     Type                        Name                  Status                  Info
     pulumi:pulumi:Stack         web-authn-dev         **failed**              1 error; 2 messages
 +   └─ cloudflare:index:Record  web-authn-cdn-record  **creating failed**     1 error
 
Diagnostics:
  pulumi:pulumi:Stack (web-authn-dev):
    error: update failed
 
  cloudflare:index:Record (web-authn-cdn-record):
    error: Failed to create record: error from makeRequest: HTTP status 400: content "{\"result\":null,\"success\":false,\"errors\":[{\"code\":81056,\"message\":\"NS records already exist with that host.\"}],\"messages\":[]}"

I need to somehow tell Pulumi to delete the existing resources which were removed from the program before creating new resources.

Jul 06 '20 23:07 snipebin

Thanks @snipebin - that is a good concrete example of this scenario.

Today, you would likely need to deploy this in two steps - first deploy with the NS records removed, then deploy again to add back the CNAME. That is definitely not ideal though - so it is good motivation to add a feature like the one tracked in this issue.

Jul 06 '20 23:07 lukehoban

Any suggestion on how to implement this in CI/CD pipelines?

Jul 10 '20 01:07 solidDoWant

I also just ran into this when updating the resource_name of a Helm chart. In a Pulumi preview, it shows that it will delete the existing kubernetes resources and then create new ones in their place. As others have pointed out, it fails when any of the resources collide by the unique name for that resource. In my case it was a PersistentVolumeClaim. The resource_name and the PVC name are set in separate user inputs so it's totally possible to define a PVC twice in Pulumi and have them collide.

Pulumi's default behavior of deleting after creating is definitely correct. If, for example, I had a pool of nodes and I was going to update these nodes then creating the new nodes before deleting the old ones is always what we want to avoid downtime.

In some cases, you want to delete before creating. It's specific to the scenario and specific to what the user wants. A flag --delete-before-create doesn't seem to be the pulumi way of codifying both infrastructure and its day-two behavior. I feel like Pulumi has phases. The create phase and then the cleanup phase. Maybe there needs to be a pre-cleanup phase before the create phase. Meaning that if a user wants their resources to be deleted before any resources are created we should be able to opt those resources into that behavior. We already have delete_before_replace so maybe a delete_before_create makes sense here.

Sep 22 '20 01:09 iridian-ks

This is definitely needed, it's particularly an issue with Helm Charts as mentioned.

Dec 13 '20 01:12 elucidsoft

I have the exact same scenario as @snipebin , with Cloudflare.

Jul 10 '21 14:07 ceefour

+1. I'm using DNSSEC from Cloudflare and whenever the project name is changed, pulumi up tries to create existing DNSSEC records and gets an error response from Cloudflare saying DNSSEC is already enabled.

Jul 25 '21 22:07 berkant

The deleteBeforeReplace is only useful if you keep the name of the resource, if you do bigger refactors that won't work as it might try to create the new resource which just has a different name from the point of view of pulumi, but not a different name in the actual cloud provider.

I can understand the downtime thing, however I think we should have the option to opt out of this behavior in case we need/want to. Actually that would be the default for us as we always set the resource names.. We don't rely on the generated names from pulumi because it gets quite complicated when you create resources in one repo and want to reference that resource in another one, etc.

Aug 25 '21 12:08 renannprado

if you do bigger refactors that won't work as it might try to create the new resource which just has a different name from the point of view of pulumi, but not a different name in the actual cloud provider.

Note that this is where aliases can be useful, and should generally be able to help ensure that two resources which really are meant to be the "same" before and after a refactor are treated as such by Pulumi.

https://www.pulumi.com/docs/intro/concepts/resources/#aliases

Aug 25 '21 15:08 lukehoban

if you do bigger refactors that won't work as it might try to create the new resource which just has a different name from the point of view of pulumi, but not a different name in the actual cloud provider.

Note that this is where aliases can be useful, and should generally be able to help ensure that two resources which really are meant to be the "same" before and after a refactor are treated as such by Pulumi.

pulumi.com/docs/intro/concepts/resources/#aliases

I can't speak to how @renannprado feels, but in my case, while it can be useful here, it's absolutely not what I'd want to use. I don't want to litter my code base with extra code because pulumi does not support common enough use cases. This is the same reason I do not like having to do deleteBeforeReplace.

Aug 25 '21 15:08 nesl247

if you do bigger refactors that won't work as it might try to create the new resource which just has a different name from the point of view of pulumi, but not a different name in the actual cloud provider.

Note that this is where aliases can be useful, and should generally be able to help ensure that two resources which really are meant to be the "same" before and after a refactor are treated as such by Pulumi.

https://www.pulumi.com/docs/intro/concepts/resources/#aliases

That might sound like a solution when you have a few resources where you have to set it, but it's not practical to do this in codebase handling lots of resources.

Aug 26 '21 07:08 renannprado

Another good example that I have just gone through:

I have a large repository which I use to provision github repositories. As part of that, I also invite users to our organization.

What happens now is that we're out of seats in our organization to invite new people. After going through the current users, we found someone that could be removed and so we did. At the same time, we added a new user, but because Pulumi attempts to first invite a new user to then delete the current one, it fails. Why? Because I need to first remove one user (to get a spare seat) and then invite the new one.

Aug 30 '21 12:08 renannprado

I have a similar but different variation on the theme of helm charts: We're moving away from the helm.Chart resource type and towards the helm.Release type, which also fails to create if any old resource exists, no matter what its name is.

I would love if there was a way to define a pseudo-resource that refers to the deletion of another resource - so that the new helm.Release could have something like depends_on=[pulumi.deleted_resource(name="other-chart", type="[...]Chart")]

Jan 19 '22 19:01 antifuchs

Also very handy if you're reaching quota limits and you can't create new resources because it only deletes the olds ones after attempting to create new ones.

Apr 18 '22 11:04 smasala

We have a need for this as well! Running into the following situation with pulumi snowflake.

Create ownership grant on database A for role blah with name: "grant_ownership_on_x_to_role_blah"
Apply the infrastructure
Refactor grant resource to use a different name "grant_role_blah_ownership_on_x"
Apply the infrastructure

The apply in step 4 first creates the new resource and afterwards deletes the old resource. Under the hood this results in the following SQL statements"

GRANT OWNERSHIP ON DATABASE X TO ROLE BLAH -- this was already the case in the first place, so no-op
GRANT OWNERSHIP ON DATABASE X TO ROLE ACCOUNTADMIN -- this reverts the result of the previous operation, setting the default role leaving the the database in an invalid state

Now this can be more-or-less solved by adding delete_before_replace and replace_on_changes resource options, but this requires adding these to all resources. So it would be really nice if we can control this behaviour on a global level, that always deletes are run before creates where-ever they happen.

May 16 '22 12:05 mvgijssel

After a customer discussion, I had an idea for something here which could help.

Could we feasible create a provider that allows passing a type token and a resource provider that essentially does the inverse of the operation.

For example:

const aws = new aws.Provider("something")

const delete = new pulumi.Delete("foo", {
  token: "aws.s3.Bucket",
}, { provider: aws })

This essentially allows you to perform the inverse operation on resource you like

Jul 26 '22 18:07 jaxxstorm

I am facing the same problem when I am renaming a resource for inline policy of a IAM user. It is throwing an error saying that character limit exceeded for the user.

waiting for --delete-before-create flag

Sep 14 '22 08:09 NagarjunARIS

would be super useful to avoid hacks with dynamic providers in scenarios where you need to delete a resource before any other resource gets created due to a conflict

Nov 03 '22 13:11 mortaelth

+1, this would be useful when switching from the aws to awsnative provider too

Nov 07 '22 20:11 jwarwick-delfi

👍

Dec 05 '22 16:12 omercnet