pulumi-cloudflare DNS Record state discrepancy

DNS Record state discrepancy

Open mikocot opened this issue 5 months ago • 2 comments

Describe what happened

Yesterday cloudflare had an outage that caused an error when creating a DNS record.

Upon retry Pulumi seems to not be aware that this record was actually created and tries to add it again, which causes the update to fail.

Refresh did not help.

Sample program

new Pulumi.Cloudflare.Record(name, new RecordArgs
            {
                ZoneId = zoneId,
                Name = name,
                Content = domainVerificationId,
                Type = "TXT",
                Ttl = 1,
                Proxied = false,
            }

Log output

original error:

cloudflare:index:Record asuid.app-ingestion.eastus2.pr1242 creating (1s) error: error reading from server: EOF

cloudflare:index:Record asuid.app-ingestion.eastus2.pr1242 creating failed error: error reading from server: EOF

some other records we tried toc reate gave slightly different error:

cloudflare:index:Record asuid.app-ingestion.pr1242 creating (1s) error: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39483: connect: connection refused"

cloudflare:index:Record asuid.app-ingestion.pr1242 creating failed error: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39483: connect: connection refused"

now, every time we retry update (before and after refresh):

cloudflare:index:Record (asuid.app-ingestion.eastus2.pr1242): error: sdk-v2/provider2.go:385: sdk.helper_schema: expected DNS record to not already be present but already exists: [email protected] error: 1 error occurred: * expected DNS record to not already be present but already exists

Affected Resource(s)

cloudflare:index:Record

Output of `pulumi about`

we use https://github.com/pulumi/actions so it's irrelevant

Additional context

This bug might be not reproducible in an easy way, without mocking CF responses, but it shows a bigger underlying issue that we have discussed with pulumi numerous times. It seems that when pulumi fails to get a successful response from a provider it assumes the resource was simply not created, while often it was or will be soon after. Instead of verifying this the next time around (or doing refresh) pulumi then tries to create that resource again, which either fails or causes a duplicate. We were assured that this scenario has been taken into account but we still see cases like this one were clearly it's a problem.

Unfortunately in such situation there is really little we can do, as the resource is created, deployment won't work, and pulumi also won't clean it as it's not aware of its existence (not that we can destroy the stack in many cases). We're aware that we could import resource, etc etc but those deployments are all part of an automated solution that deploy simtimes hundreds of times per day and we use pulumi to exactly avoid having to manage all such cases.

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

Sep 19 '24 08:09 mikocot

pulumi-cloudflare pulumi-cloudflare copied to clipboard

DNS Record state discrepancy

Describe what happened

Sample program

Log output

Affected Resource(s)

Output of pulumi about

Additional context

Contributing

pulumi-cloudflare
pulumi-cloudflare copied to clipboard

Output of `pulumi about`