blocky icon indicating copy to clipboard operation
blocky copied to clipboard

Use strict order for upstreams

Open agneevX opened this issue 2 years ago • 25 comments

Currently blocky...

Blocky picks 2 random resolvers from the list for each query and returns the answer from the fastest one. This improves your network speed and increases your privacy - your DNS traffic will be distributed over multiple providers.

This works very well, but is not desirable when you want to use a known resolver as primary all the time and want to use a secondary resolver only as backup.

I propose adding an option to query the first resolver in the list, then falling back to secondary and so on... after any of the following:

  • Timeout is reached (half of the upstreamTimeout value maybe)
  • REFUSED is returned. Google does this for some queries containing ECS data. More on that issue here.

agneevX avatar Dec 03 '21 14:12 agneevX

I also would like an option to configure the upstream policy.

Maybe we could implement a configuration enum like:

  • parallel_best(old behavior)(default)
  • strict(behavior as mentioned in first post)
  • random(one request to a random resolver in list)

This would mirror the first two options in adguard home. (I don't get the third option and never used it 😅)

kwitsch avatar Dec 03 '21 21:12 kwitsch

Current implementation was designed to combine privacy with performance:

  • blocky peeks random (weighted, upstream resolvers with errors become "penalty") 2 resolvers and returns answer from the fastest
  • If you define 10 upstream resolver, each receives only 20% of your DNS traffic

We can provide additional "strategies", like strict, random or random weighted based on upstream resolver response time.

0xERR0R avatar Dec 07 '21 09:12 0xERR0R

Maybe we can also implement a "hyperlocal" mode: Blocky works as a recursive resolver and doesn't rely on any upstream resolver? That means blocky will recursively ask the corresponding name server and caches results. This will significantly improve the privacy, but is probably slow for queries with many subdomains.

Any thoughts?

0xERR0R avatar Dec 10 '21 11:12 0xERR0R

What do you mean by "corresponding name server"? Do you mean something like Unbound?

agneevX avatar Dec 10 '21 12:12 agneevX

Yes, like unbound, but in blocky. In this case we can reuse blocky's cache and provide additional prometheus metrics.

0xERR0R avatar Dec 10 '21 13:12 0xERR0R

A few things I've observed when I used to use Unbound to query root name servers:

  • Queries to root name servers take far too long. <1000ms queries are fairly common.
  • It's fully unencrypted and uses ports 53/udp and 53/tcp. My ISP immediately hijacks that and redirects to their BIND server.

agneevX avatar Dec 10 '21 14:12 agneevX

I don't think that it would be feasible to include a recursive dns server option. Most users won't use it as forward dns servers are more common. Therefore it would most likely just increase binary size.

In my setup there are multiple unbound instances as upstream resolvers for blocky. Even I wouldn't use an internal recursive option as this would reduce my fault tolerance and configuration option.

kwitsch avatar Dec 10 '21 17:12 kwitsch

I'm currently trying to migrate from Pi-Hole to Blocky, since it is much better suited for running on K8s, but this issue is currently blocking me from doing so, unless I'm missing another option. I want the LanCache DNS server to always be preferred if it is available.

My current Setup, with Pi-Hole using strict order, looks like

Router --- Pi-Hole --- LanCache --- Unbound
              \_______________________/

With Blocky, I think currently the only options would be

Router --- LanCache --- Blocky --- Unbound

or

Router --- Blocky --- LanCache --- Unbound

with LanCache being a SPOF since both Blocky and Unbound have multiple replicas.

reitermarkus avatar Feb 13 '22 19:02 reitermarkus

@reitermarkus is that the Steam LAN thing?

If so, it should not be a problem if LC answers queries faster than your other upstreams.

agneevX avatar Feb 14 '22 05:02 agneevX

Yes, it's for caching Steam games, among other things.

Well, my other upstream is Unbound running in the same cluster, so it's quite likely that LanCache will not be significantly faster, if at all.

reitermarkus avatar Feb 14 '22 05:02 reitermarkus

I'm not sure about blocky, but I know AdGuard Home has a Fastest IP feature that does exactly what you want.

agneevX avatar Feb 14 '22 06:02 agneevX

Conditional DNS configuration (https://0xerr0r.github.io/blocky/configuration/#conditional-dns-resolution) could work if you can figure out which DNS names are used (steamcontent.com for example for steam, maybe others?) Did you try this approach?

0xERR0R avatar Feb 14 '22 07:02 0xERR0R

AdGuard Home has a Fastest IP feature that does exactly what you want.

I had a look ad AdGuard Home before finding Blocky, but it has the same issue as Pi-Hole: No easy way to have multiple replicas.

Conditional DNS configuration (https://0xerr0r.github.io/blocky/configuration/#conditional-dns-resolution) could work

That depends: Will conditional DNS fall back to using the default upstream when LanCache DNS is down?

reitermarkus avatar Feb 14 '22 19:02 reitermarkus

That depends: Will conditional DNS fall back to using the default upstream when LanCache DNS is down?

No, blocky will ask your lancache instance and if it returns NXDOMAIN, there is no fallback. Is it not the desired behaviour? Since lancache will either return the ip of local cache or the origin ip.

0xERR0R avatar Feb 14 '22 20:02 0xERR0R

The problem would be if LanCache is down, now I cannot resolve any cached domains. Basically, I want to be able to download game updates even if LanCache is down for whatever reason.

Currently, this works by having LanCache as first DNS server, and if it is down, fall back to the next, i.e. downloads fall back to using the uncached upstream IP.

reitermarkus avatar Feb 14 '22 21:02 reitermarkus

Is this the way how pihole works? If one upstream DNS is down, it tries the second one (and not round-robin)? That means, if you query for example for "google.com", the pihole will ask you lancache instance first, does lancache return NXDOMAIN or will it resolve this query properly (by using some external resolver)?

0xERR0R avatar Feb 14 '22 21:02 0xERR0R

Is this the way how pihole works?

Not by default, but since it uses DNSmasq, I can configure it to use strict order.

the pihole will ask you lancache instance first, does lancache return NXDOMAIN or will it resolve this query properly (by using some external resolver)?

LanCache will resolve it, using Unbound as upstream. And the same Unbound server acts as the fallback DNS server in Pi-Hole.

So in case LanCache is running:

Pi-Hole -> LanCache -> Unbound

In case LanCache is down:

Pi-Hole -> Unbound

reitermarkus avatar Feb 14 '22 21:02 reitermarkus

ok, got it. The requested "strict order resolution" will solve this challenge. With conditional mapping, you won't get the fallback resolution.

0xERR0R avatar Feb 14 '22 21:02 0xERR0R

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Aug 04 '22 09:08 github-actions[bot]

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Nov 03 '22 04:11 github-actions[bot]

Hi all, I would like to contribute both the strict & random (non weighted) resolvers.

Where should we add the new upstreamStrategy config field. Should we start with adding it as a global enum which configures the strategy for all upstream groups? If there are use cases for a group scoped enum we could still discuss adding it later on.

DerRockWolf avatar Jul 28 '23 15:07 DerRockWolf

Hi all, I would like to contribute both the strict & random (non weighted) resolvers.

That sounds good! :+1:

Currently, we do have the "upstream" section and related "UpstreamTimeout". The "upstream" section is not a nested struct, but only a map (historical reasons). It would be better to have all upstream related configurations in a separate structure, but in this case we'll introduce breaking changes. So I think it would be better (for a moment) to introduce a new top-level config enum "upstreamStrategy" and refactor the "ParallelBestResolver" to extract the resolver choose logic for example in a separate interface. So we can implement more strategies later.

0xERR0R avatar Jul 28 '23 19:07 0xERR0R

Currently, we do have the "upstream" section and related "UpstreamTimeout". The "upstream" section is not a nested struct, but only a map (historical reasons). It would be better to have all upstream related configurations in a separate structure, but in this case we'll introduce breaking changes.

I've got local changes to allow having more config there, and be back-compat. Basically I also renamed it to upstreams instead of upstream, so we can use our standard option deprecation flow.
The main goal of those changes is to have parallel init for upstreams (#835). It's almost done so I could make a PR soon. But I think I can even split the config change so we can merge that quicker and @DerRockWolf can use that as a base.

EDIT: so if you, @DerRockWolf, have already started some work, don't worry too much about the config, just add something to the big Config struct, and moving your struct into the one I created should be easy :)

refactor the "ParallelBestResolver" to extract the resolver choose logic for example in a separate interface. So we can implement more strategies later.

Related to #1001

ThinkChaos avatar Jul 28 '23 21:07 ThinkChaos

Bad weather gave me a bit of extra time today, so I opened #1086 with just the config change.

ThinkChaos avatar Jul 28 '23 23:07 ThinkChaos

@agneevX my PR (#1093) implementing the strict strategy doesn't tackle:

REFUSED is returned. Google does this for some queries containing ECS data.

The "upstream resolver" contacting the upstream DNS server only returns err if it didn't get a reply. The responses are returned as received, regardless of the DNS message response codes.

This is also currently the case for the parallel_best resolver. If google DNS replies REFUSED and wins the race, blocky will return the answer from google.

We would need to implement custom handling based on the DNS response codes.

DerRockWolf avatar Aug 12 '23 18:08 DerRockWolf