dnscontrol icon indicating copy to clipboard operation
dnscontrol copied to clipboard

Add a TTL model

Open str4d opened this issue 11 months ago • 5 comments

This unifies the logic for handling the dnscontrol-default TTL, and frees up the TTL value 0 for use by providers (e.g. Linode, which uses it as its sentinel for its default TTL).

Closes StackExchange/dnscontrol#2444.

str4d avatar Jul 10 '23 01:07 str4d

With this change, I am able to fully resolve #2440 by adding an explicit DefaultTTL(0) to all of my DSP_LINODE sections. It would be slightly cleaner if providers could override the global DefaultTTL for themselves, but that would be a more complex change to the API, whereas just having models.TTL differentiate between "unset" and "set to 0" is simpler to reason about.

It will likely be necessary for providers to be checked for previously-hidden assumptions about the value 0; before when parsing this from the provider into dnscontrol it would be silently coerced to models.DefaultTTL in other places, whereas now 0 is preserved unless explicitly checked for by the provider and replaced by models.EmptyTTL().

str4d avatar Jul 10 '23 01:07 str4d

Hi there! I'm on vacation so please expect delays.

Thank you for taking the time to work on this issue. The proposed change is pretty fundamental and will require extensive discussion and testing.

I have a number of over-arching concerns:

  1. I have a concern about code complexity. Would it be less complex to declare TTL=0 to be the default? It is not a valid TTL, thus we should be able to use it. This would eliminate a lot of the complexity especially the use of pointers.
  2. Do we want to support a default TTL at all? Every vendor has a different default TTL. Therefore depending on the default makes the zone non-portable to other providers. It also makes the use of dual-hosted domains more difficult. I could be convinced that zone portability is rare and insignificant. anyway... maybe we should simply change any "default" to (for example) 1 day; the next "push" would remove the use of default from the zone. Then all our our domain data is explicit.

Code-specific suggestions:

  1. Rather than add to dns.go, please move the TTL-related code to models/ttl.go
  2. ValueRef(): Is there a better term for this?
  3. A lot of tests are breaking, especially for BIND. cd integrationTest && go test -v -verbose -provider BIND

TomOnTime avatar Jul 11 '23 00:07 TomOnTime

Thank you for taking the time to work on this issue. The proposed change is pretty fundamental and will require extensive discussion and testing.

Yep, as expected! I will say that I'm not claiming this PR is the optimal approach (I am not a Go or JS developer, and have fumbled my way into something that compiles and works with a specific provider), but my hope is that it helps us to coalesce around a viable approach.

I have a number of over-arching concerns:

  1. I have a concern about code complexity. Would it be less complex to declare TTL=0 to be the default? It is not a valid TTL, thus we should be able to use it. This would eliminate a lot of the complexity especially the use of pointers.

In this PR I didn't do that because I was trying to preserve dnscontrol's global default (to not affect the existing user API) while also enabling the full unsigned integer range to be used for per-provider TTL values (as I have not surveyed what values providers actually use for different settings).

If the global dnscontrol default were either removed, or set up to be the fallback if an individual provider doesn't have its own default, and we confirmed that TTL=0 isn't used as any other kind of sentinel by any provider, then yeah using TTL=0 for this would simplify the implementation.

As an aside, my first local attempt at this PR did not use pointers, instead opting for an Option-based approach (I'm a Rust developer 🦀) via a third party dependency that dnscontrol already depended on elsewhere. However, one single provider within dnscontrol required that the TTL value could itself be referenced via a pointer, which was not possible for the non-pointer Option type. I did not spend the time to figure out if that was a hard requirement for the provider, and instead switched to using a pointer inside an explicit TTL model (the latter being a better approach anyway).

  1. Do we want to support a default TTL at all? Every vendor has a different default TTL. Therefore depending on the default makes the zone non-portable to other providers. It also makes the use of dual-hosted domains more difficult. I could be convinced that zone portability is rare and insignificant. anyway... maybe we should simply change any "default" to (for example) 1 day; the next "push" would remove the use of default from the zone. Then all our our domain data is explicit.

I personally don't particularly care whether you want to support a default TTL or not. My own issue is "solved" by this PR, in that I'm wanting to move my DNS settings away from Linode, and all I wanted was to confirm that my dnsconfig.js was correct (the existing diff view makes it very hard to verify this). So I don't actually need this PR to be merged.

However, there doesn't appear to be a clear decision one way or the other regarding default TTLs, and the implicit decision to map 0 to 300 instead of having separate "unconfigured" or "default" states causes problematic interactions with individual providers (e.g. #2440). Even if there isn't support for any "default" TTL in the model, I think having a TTL model (and a way to tell when the user is using it) is useful.

Code-specific suggestions:

  1. Rather than add to dns.go, please move the TTL-related code to models/ttl.go

Done.

  1. ValueRef(): Is there a better term for this?

IDK, I can go with whatever is standard Go naming convention here. In Rust we use *_ref suffixes to denote APIs that return immutable references (as opposed to owned values).

  1. A lot of tests are breaking, especially for BIND. cd integrationTest && go test -v -verbose -provider BIND

Thanks, I couldn't figure out how to run the tests. It took me a bit to figure out that I actually had to set an environment variable to get this to run:

cd integrationTest && BIND_DOMAIN=example.com go test -v -verbose -provider BIND

And the test failure was because prettyzone was trying to print the TTL model directly instead of TTL.Value(), and so was writing the address of the pointer into the zone file 😅 Fixed.

Force-pushed with changes 1 and 3.

str4d avatar Jul 22 '23 20:07 str4d

I appreciate all the effort you've put into this. While this solution will work, I think there are other solutions that are more simple that will also work. I always favor the more simple solution.

I don't think we can solve this globally until first we decide how we want TTLs to work. Most of the TTL-handling code was written in 2016-ish when DNSControl had fewer formal design specifications (translation: we were guessing our way through most design decisions and a lot of specifics were just implemented without a formal design). As a result, each provider handles things slightly differently. Eventually we should do an RFC and get feedback from the community about how TTLs should work, get consensus, then implement a solution based on that feedback.

In the meanwhile, I think we should fix this just for Linode rather than doing a global change that affects nearly all providers.

Therefore, I'm asking you to start a different PR that fixes this for Linode only.

Thanks for understanding.

tlimoncelli avatar Sep 07 '23 21:09 tlimoncelli

I appreciate all the effort you've put into this. While this solution will work, I think there are other solutions that are more simple that will also work. I always favor the more simple solution.

@tlimoncelli If you know of a simpler solution, I am all ears!

The core problem is the aliasing everywhere of TTL = 0; this needs to be addressed somehow. The JS model uses it to mean both "nothing configured" and "user explicitly configured 0: https://github.com/StackExchange/dnscontrol/blob/af91e37043d49ce1d5f8ecc54c0196d972772bae/pkg/js/helpers.js#L100-L109 https://github.com/StackExchange/dnscontrol/blob/af91e37043d49ce1d5f8ecc54c0196d972772bae/pkg/js/helpers.js#L251-L257 https://github.com/StackExchange/dnscontrol/blob/af91e37043d49ce1d5f8ecc54c0196d972772bae/pkg/js/helpers.js#L1048-L1051 https://github.com/StackExchange/dnscontrol/blob/af91e37043d49ce1d5f8ecc54c0196d972772bae/pkg/js/helpers.js#L222-L228

The Go model dns.RecordConfig is parsed directly from the JS model:

https://github.com/StackExchange/dnscontrol/blob/af91e37043d49ce1d5f8ecc54c0196d972772bae/models/record.go#L86-L92

dns.RecordConfig is used directly by every single provider, so any solution that alters the type of TTL (as this PR does) is by definition a global solution. Your request therefore prevents me from doing this.

Additionally, individual providers never see RecordConfig.TTL = 0, because of normalize.ValidateAndNormalizeConfig: https://github.com/StackExchange/dnscontrol/blob/af91e37043d49ce1d5f8ecc54c0196d972772bae/pkg/normalize/validate.go#L348-L354

Delaying or changing this normalization would require changes to every single provider to handle the fact that their assumptions about default TTL handling have been broken. So I can't alter that either if I am to satisfy your request.

As a result of the above two implementation details, there is currently no way for the Linode provider to distinguish these three cases:

  • No setting, falling back on dnscontrol's global default TTL of 300 (changing this to instead fall back on the provider's default TTL is something I presume would be part of the RFC you describe).
  • The user explicitly configuring DefaultTTL(0) (which should always be interpreted as a valid Linode TTL setting).
  • The user explicitly configuring DefaultTTL(300).

I'll be honest, I do not see a simpler solution than "change the type of TTL". I see (what I believe to be) a more complex solution that allows for an incremental migration, however:

  • Add a new TTL field to dns.RecordConfig with a different name (say, TypedTTL).
  • Connect TypedTTL to the JS model in one of two ways:
    • Add similar duplicate fields to the JS model, and handle user updates to them in parallel with the existing defaultTTL and ttl fields.
    • Change the JS model to be nullable (like this PR) and then change the Go parser to parse the ttl field into TypedTTL, and then copy the parsed value into the existing TTL field (aliasing None and Some(0) to 0).
  • Change every single location in the codebase that modifies dns.RecordConfig.TTL to also modify TypedTTL, except for normalize.ValidateAndNormalizeConfig (which would leverage TypedTTL's None case as meaning "use the dnscontrol global default").
    • Hopefully no providers are modifying TTL so I don't have to modify their code.
  • Change the Linode provider to use TypedTTL instead of TTL.
    • All other providers are left using TTL as-is.

If this approach is along the lines of what you were thinking, then I'll go ahead and implement it. And if there is some aspect of dnscontrol that I'm completely overlooking (as a non-Go developer or as a new contributor) that would make this way simpler, I'd love to know.

str4d avatar Sep 09 '23 20:09 str4d