pulumi-cloudflare
pulumi-cloudflare copied to clipboard
DNS Record state discrepancy
Describe what happened
Yesterday cloudflare had an outage that caused an error when creating a DNS record.
Upon retry Pulumi seems to not be aware that this record was actually created and tries to add it again, which causes the update to fail.
Refresh did not help.
Sample program
new Pulumi.Cloudflare.Record(name, new RecordArgs
{
ZoneId = zoneId,
Name = name,
Content = domainVerificationId,
Type = "TXT",
Ttl = 1,
Proxied = false,
}
Log output
original error:
- cloudflare:index:Record asuid.app-ingestion.eastus2.pr1242 creating (1s) error: error reading from server: EOF
- cloudflare:index:Record asuid.app-ingestion.eastus2.pr1242 creating failed error: error reading from server: EOF
some other records we tried toc reate gave slightly different error:
- cloudflare:index:Record asuid.app-ingestion.pr1242 creating (1s) error: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39483: connect: connection refused"
- cloudflare:index:Record asuid.app-ingestion.pr1242 creating failed error: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39483: connect: connection refused"
now, every time we retry update (before and after refresh):
cloudflare:index:Record (asuid.app-ingestion.eastus2.pr1242): error: sdk-v2/provider2.go:385: sdk.helper_schema: expected DNS record to not already be present but already exists: [email protected] error: 1 error occurred: * expected DNS record to not already be present but already exists
Affected Resource(s)
cloudflare:index:Record
Output of pulumi about
we use https://github.com/pulumi/actions so it's irrelevant
Additional context
This bug might be not reproducible in an easy way, without mocking CF responses, but it shows a bigger underlying issue that we have discussed with pulumi numerous times. It seems that when pulumi fails to get a successful response from a provider it assumes the resource was simply not created, while often it was or will be soon after. Instead of verifying this the next time around (or doing refresh) pulumi then tries to create that resource again, which either fails or causes a duplicate. We were assured that this scenario has been taken into account but we still see cases like this one were clearly it's a problem.
Unfortunately in such situation there is really little we can do, as the resource is created, deployment won't work, and pulumi also won't clean it as it's not aware of its existence (not that we can destroy the stack in many cases). We're aware that we could import resource, etc etc but those deployments are all part of an automated solution that deploy simtimes hundreds of times per day and we use pulumi to exactly avoid having to manage all such cases.
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).