dnscontrol icon indicating copy to clipboard operation
dnscontrol copied to clipboard

FEATURE: New flag --delete-unmanaged: Remove unmanaged domains & zones (implements #1976)

Open Yannik opened this issue 2 years ago • 24 comments

@tlimoncelli

This is an implementation of the feature we discussed in #1976.

The feature can be used like this:

dnscontrol preview --delete-unmanaged-domains
dnscontrol preview --delete-unmanaged-zones
dnscontrol preview --delete-unmanaged # same as --delete-unmanaged-domains --delete-unmanaged-zones

The output looks like this:

Checking Domain Removal:
Removing domain mydomain.com from provider hosting.de
Checking Zone Removal:
Removing zone mydomain.com from provider hosting.de

Yannik avatar Feb 02 '23 14:02 Yannik

  1. Deleting unmanaged domains is only possible if exactly one domain registrar is being used (otherwise the command throws an error)

  2. Deleting unmanaged zones is possible with any number of dns providers, with the limitation that all dns providers need to be configured to serve all the configured zones.

Actually, after reviewing this again, the code I submitted in this PR was the implementation of a different approach.

I will go over this whole thing again tomorrow. (I'm currently in GMT+8, so it's really late here :-)) Thanks for the code review, I will try to address your comments tomorrow as well.

Yannik avatar Feb 02 '23 17:02 Yannik

I will go over this whole thing again tomorrow. (I'm currently in GMT+8, so it's really late here :-))

Sleep is more important that dnscontrol! I promise!

tlimoncelli avatar Feb 02 '23 21:02 tlimoncelli

Hi @tlimoncelli

I have gone over this again and pushed a new version of my PR.

Some of the things from my OP were incorrect and based on wrong assumptions. I think I got a bit caught up with understanding go and lost sight of the bigger picture.

With this new version any number of registrars/dns providers can be correctly checked for unmanaged domains/zones. (Therefore, the limit of one registrar has been removed)

We can now also use --delete-unmanaged which acts as an alias for the rather unwieldy --delete-unmanaged-domains --delete-unmanaged-zones.

It would be great to get a new review on this. I have also added documentation for this feature with my new commits.

Regarding the output of the command, I think there is room for improvement. If you have any good ideas, please bring them forward!

Yannik avatar Feb 03 '23 15:02 Yannik

Looking good so far!

Some notes:

  1. My "preview" takes at least a minute to run through all my domains. Waiting for the zone/domain report at the end was a drag. Maybe the domain/zone deletion should happen first? (Or maybe this is whyOr maybe this means we should implement the sub-commands instead of putting this in preview/push.

  2. I'm getting panics when I run the code:

I got the panics with all the providers I tested it with: NONE, CSCGLOBAL, NAMEDOTCOM

The panic happens on this line:

deployedDomains, err := domainLister.ListDomains()

Here's the panic:

dnscontrol preview --delete-unmanaged
Checking Domain Removal:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x178e7a4]

goroutine 1 [running]:
github.com/StackExchange/dnscontrol/v3/commands.run({{{{0x246267e, 0xc}, {0x0, 0x0}, 0x0, {{...}, {...}, 0x0}}, {0x0, 0x0}}, ...}, ...)
	/Users/tlimoncelli/git/dnscontrol/commands/previewPush.go:246 +0xfe4
github.com/StackExchange/dnscontrol/v3/commands.Preview(...)
	/Users/tlimoncelli/git/dnscontrol/commands/previewPush.go:119
github.com/StackExchange/dnscontrol/v3/commands.glob..func7.1(0xc00038e160?)
	/Users/tlimoncelli/git/dnscontrol/commands/previewPush.go:26 +0xa6
github.com/urfave/cli/v2.(*Command).Run(0xc00038e160, 0xc0006978c0, {0xc0003ba840, 0x2, 0x2})
	/Users/tlimoncelli/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:273 +0xa42
github.com/urfave/cli/v2.(*Command).Run(0xc00038edc0, 0xc000697740, {0xc0000401b0, 0x3, 0x3})
	/Users/tlimoncelli/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:266 +0xc97
github.com/urfave/cli/v2.(*App).RunContext(0xc0000423c0, {0x26e0b20?, 0xc000046098}, {0xc0000401b0, 0x3, 0x3})
	/Users/tlimoncelli/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:332 +0x616
github.com/urfave/cli/v2.(*App).Run(...)
	/Users/tlimoncelli/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:309
github.com/StackExchange/dnscontrol/v3/commands.Run({0xc0003dd560?, 0x2460c00?})
	/Users/tlimoncelli/git/dnscontrol/commands/commands.go:70 +0x325
main.main()
	/Users/tlimoncelli/git/dnscontrol/main.go:34 +0x134

tlimoncelli avatar Feb 03 '23 17:02 tlimoncelli

@tlimoncelli Regarding 2: this is surely due to ListDomains() not being implemented for these providers. However, it shouldn't crash but throw an error.

			domainLister, ok := registrar.(providers.DomainLister)
			if !ok {
				fmt.Errorf("provider type of %s cannot list domains\n", registrarName)
			}
			deployedDomains, err := domainLister.ListDomains()

Do you have any idea why this is not working? I copied this from the getZone ZoneLister check. I'm not 100% certain how this check works, I just assumed it would check whether the methods of the interface are implemented.

You could also try --delete-unmanaged-zones, and skip checking domains for now.

Yannik avatar Feb 03 '23 17:02 Yannik

I don't understand why that's happening. I read the code and though it should skip the provider properly. :-(

Smart idea to try --delete-unmanaged-zones. That finds a number of unmanaged zones (they were moved to a different provider, but we never cleaned up the old location). However it crashes on a different location.

Checking Zone Removal:
Removing zone stackpromos.com from provider gcloud
Removing zone miniprofiler.com from provider gcloud
Removing zone stackoverflow.co from provider gcloud
Removing zone stackenterprise.co from provider gcloud
Removing zone asksf.com from provider r53
Removing zone stackoverflow.net from provider r53
Removing zone aws.ds.stackexchange.com from provider r53
Removing zone stackoverflow.co from provider r53
Removing zone bosun.org from provider r53
Removing zone selfserve-stackexchange.com from provider r53
Removing zone apptivate.ms from provider r53
Removing zone stackpromos.com from provider r53
Removing zone stackenterprise.co from provider r53
Removing zone miniprofiler.com from provider r53
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x178e1ee]

goroutine 1 [running]:
github.com/StackExchange/dnscontrol/v3/commands.run({{{{0x24626fe, 0xc}, {0x0, 0x0}, 0x0, {{...}, {...}, 0x0}}, {0x0, 0x0}}, ...}, ...)
	/Users/tlimoncelli/git/dnscontrol/commands/previewPush.go:276 +0xa2e
github.com/StackExchange/dnscontrol/v3/commands.Preview(...)
	/Users/tlimoncelli/git/dnscontrol/commands/previewPush.go:119
github.com/StackExchange/dnscontrol/v3/commands.glob..func7.1(0xc00032ab00?)
	/Users/tlimoncelli/git/dnscontrol/commands/previewPush.go:26 +0xa6
github.com/urfave/cli/v2.(*Command).Run(0xc00032ab00, 0xc0000d0480, {0xc00014d6a0, 0x2, 0x2})
	/Users/tlimoncelli/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:273 +0xa42
github.com/urfave/cli/v2.(*Command).Run(0xc00019c2c0, 0xc0000d0300, {0xc000110180, 0x3, 0x3})
	/Users/tlimoncelli/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:266 +0xc97
github.com/urfave/cli/v2.(*App).RunContext(0xc0000421e0, {0x26e0be0?, 0xc000122008}, {0xc000110180, 0x3, 0x3})
	/Users/tlimoncelli/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:332 +0x616
github.com/urfave/cli/v2.(*App).Run(...)
	/Users/tlimoncelli/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:309
github.com/StackExchange/dnscontrol/v3/commands.Run({0xc000047780?, 0x2460c80?})
	/Users/tlimoncelli/git/dnscontrol/commands/commands.go:70 +0x325
main.main()
	/Users/tlimoncelli/git/dnscontrol/main.go:34 +0x134

tlimoncelli avatar Feb 03 '23 17:02 tlimoncelli

@tlimoncelli

It doesn't crash for me (otherwise I wouldn't have submitted the PR :D).

Can you confirm the line of code it is crashing on? (Line 276 in my branch is for _, zone := range deployedZones {).

Do you have more providers than gcloud and r53 configured?

Yannik avatar Feb 04 '23 01:02 Yannik

@tlimoncelli

I don't understand why that's happening. I read the code and though it should skip the provider properly. :-(

I have fixed that with my latest commit. The if !ok branch was simply missing a return/continue to stop processing the current provider.

Yannik avatar Feb 04 '23 17:02 Yannik

@tlimoncelli

I don't understand why that's happening. I read the code and though it should skip the provider properly. :-(

I have fixed that with my latest commit. The if !ok branch was simply missing a return/continue to stop processing the current provider.

Looks like that fixed it! Thanks!

tlimoncelli avatar Feb 04 '23 20:02 tlimoncelli

@tlimoncelli did that also fix the second crash you experienced, or is that still present?

Yannik avatar Feb 04 '23 20:02 Yannik

rebased on master

Yannik avatar Feb 07 '23 12:02 Yannik

Sadly CSCGLOBAL is still listing every zone :(

tlimoncelli avatar Feb 07 '23 13:02 tlimoncelli

@tlimoncelli

Sadly CSCGLOBAL is still listing every zone :(

Unfortunately I cannot solve that without more information. I listed some things to look at in this comment.

Additionally, could you create a fresh creds file with cscglobal only, and a dnsconfig.js with only a single (already deployed!) domain configured, to see if the issue can be reproduced in this minimal testcase? Maybe also post these configs here.

Yannik avatar Feb 07 '23 13:02 Yannik

Hi @Yannik

Good news! The problem with CSCGLOBAL is fixed. It wasn't your code. It was our dnsconfig.js!

  • A macro we use to specify a parked domain was missing the DnsProvider(DSP_CSCGLOBAL) statement in the D(). Therefore it wasn't managed!
  • A few domains had been added to our account without adding them to dnsconfig.js. Maybe we need a --report-unmanaged-zones flag!

Not so good news:

It took me quite a long time to trace the code and understand what it was trying to do. The layers of nested loops is elegant but difficult to understand unless you really know the structs well. In fact, I wasn't sure what CSCGOBAL's problem was until I started reimplementing the algorithm. Now the logic is more visible to the reader.

The results of my rewrite are here: https://github.com/StackExchange/dnscontrol/pull/2061

I'd like you to consider rewriting the domains code the same way.

Thanks!

tlimoncelli avatar Feb 07 '23 19:02 tlimoncelli

Hi @tlimoncelli

I'm glad you found the issue!

I have also reviewed #2061: Personally, I feel like the setFromStrings is too much boilerplate (that needs brainpower to be understood as well) just to be able to use the Difference function. What about a compromise: Extract getting the managed/deployed zones from the loop, but still loop over the deployedZones and use the slices.Contains(managedZones, zone) function to check if it is managed or not? I think this would be a good solution.

Yannik avatar Feb 07 '23 20:02 Yannik

Using slices.Contains instead of sets seems reasonable! Show me what that would look like.

tlimoncelli avatar Feb 07 '23 20:02 tlimoncelli

@tlimoncelli I have changed some things to make the code easier to understand (i.e. replaced the loop over DnsProviderNames with a simple check if the relevant key exists, split up if statements, ordered returns more logically). I've also added comments to explain what each function does.

In addition, I've improved the output format a bit to match the other output of dnscontrol and make it more readable.

IMO, the code is in pretty good shape to be merged now. :)

Yannik avatar Feb 08 '23 07:02 Yannik

Thank you for showing what the compromise would look like. I was willing to see what you were proposing but sadly the new revision has the same issues as before. Specifically:

  1. It is impossible to understand the intended algorithm by looking at the code.
  2. It doesn't work for CSCGLOBAL.

These two issues are related: I can't debug the CSCGLOBAL problem because I can't create a mental model of what the intended algorithm is and compare that to the code that I see on my screen.

Another issue is that this code isn't testable. My proposed alternative is broken down into smaller functions, each could have a unit test. A unit test for DeleteUnmanagedZones/DeleteUnmanagedDomains in https://github.com/StackExchange/dnscontrol/pull/2050 would be "all or nothing" which would be brittle.

I've offered you a PR showing how it should be implemented. You're going to have to adopt that PR's method (i.e. adapt DeleteUnmanagedDomains and DeleteUnmanagedZones to by like DeleteUnmanagedDomains in https://github.com/StackExchange/dnscontrol/pull/2061) or I will have to reject this PR because it fails to work on CSCGLOBAL.

I have spent too much time trying to debug the CSCGLOBAL issue and can not dedicate more time to it.

My recommended next steps:

  • Adopt the design of DeleteUnmanagedDomains in https://github.com/StackExchange/dnscontrol/pull/2061) for DeleteUnmanagedDomains in https://github.com/StackExchange/dnscontrol/pull/2061
  • Replace the use of mapset and use slices.Contains() as you previously suggested.

Tom

tlimoncelli avatar Feb 08 '23 14:02 tlimoncelli

@tlimoncelli

It doesn't work for CSCGLOBAL.

I'm a bit confused. I thought the CSCGLOBAL issue was caused by not wrapping the provider in DnsProvider(..) in your dnsconfig.js?

Yannik avatar Feb 08 '23 14:02 Yannik

@tlimoncelli

It doesn't work for CSCGLOBAL.

I'm a bit confused. I thought the CSCGLOBAL issue was caused by not wrapping the provider in DnsProvider(..) in your dnsconfig.js?

That was required to get https://github.com/StackExchange/dnscontrol/pull/2061 working. It still has the same bug in https://github.com/StackExchange/dnscontrol/pull/2050

tlimoncelli avatar Feb 08 '23 14:02 tlimoncelli

@tlimoncelli

It doesn't work for CSCGLOBAL.

I'm a bit confused. I thought the CSCGLOBAL issue was caused by not wrapping the provider in DnsProvider(..) in your dnsconfig.js?

That was required to get #2061 working. It still has the same bug in #2050

Could you kindly check that you really tested this with a corrected dnsconfig.js? Would it be possible for you to post a snippet of the dnscontrol print-ir output for one of the affected domains?

Yannik avatar Feb 08 '23 16:02 Yannik

Could you kindly check that you really tested this with a corrected dnsconfig.js? Would it be possible for you to post a snippet of the dnscontrol print-ir output for one of the affected domains?

Sure! Could you send me your email address and I'll send it. You can send to me privately via this link: https://transfer.secretoverflow.com/u/tlimoncelli

tlimoncelli avatar Feb 08 '23 17:02 tlimoncelli

I was thinking about this command and I realized it might be better as a subcommand instead of part of preview/push. When I wanted to try out this command, a co-worker was making changes to the domain and I wished this wasn't part of "preview" or that there was a flag that would only do domain/zone purges, no DNS record updates. That's when I realized this might be better as a subcommand:

Maybe something like this?

dnscontrol purge [--zones|--domains] [--dry-run]

(The default would be to do both zones and domains)

TomOnTime avatar Mar 15 '23 22:03 TomOnTime

@tlimoncelli I will try to get back onto this in april, currently I am very short on time for working on this.

Yannik avatar Mar 17 '23 18:03 Yannik

The bad news is: The new ppreview code obsoletes all of the code in this PR. So, implementing this will have to be done differently.

The good news is: The new ppreview code makes implementing this much easier.

tlimoncelli avatar May 02 '24 20:05 tlimoncelli