nebula icon indicating copy to clipboard operation
nebula copied to clipboard

Increase default `lookup_timeout` from 250ms to 5s

Open maggie44 opened this issue 1 year ago • 1 comments

I get a lot of errors in the logs like these, particularly when starting the service for the first time:

ERRO[0000] DNS resolution failed for static_map host     error="lookup example.com: i/o timeout" hostname=example.com network=ip4

When using multiple lighthouses some of them resolve ok, others timeout, and eventually they all resolve on future loops. The DNS resolution is working but the timeout is sometimes being reached before it has a time to finish.

The timeout for these requests is current set at 250ms. This is extremely low and can't see any reason why.

Here are some example defaults from elsewhere for some precedent:

https://github.com/istio/istio/blob/ac901c3ed1a2455705709bd5e81df781d7a63083/pilot/pkg/util/network/ip.go#L145 https://github.com/tailscale/tailscale/blob/a4a909a20b0f868de4870294e200e803f61589f7/ipn/localapi/debugderp.go#L161

This PR raises the default timeout to 5s.

maggie44 avatar Feb 19 '24 19:02 maggie44

Thanks for the contribution! Before we can merge this, we need @maggie44 to sign the Salesforce Inc. Contributor License Agreement.

salesforce-cla[bot] avatar Feb 19 '24 19:02 salesforce-cla[bot]

Hi @maggie44 -

Thanks for the contribution. We discussed this a bit and the reason that this timeout is set lower is so that one bad resolver / address can't hold up the entire DNS sub-routine.

You can see here that each DNS address is processed in serial: https://github.com/slackhq/nebula/blob/master/remote_list.go#L124-L135

While it doesn't block the hot path, it could result in a longer period to establish connections to the Lighthouse, if an early address is slow to resolve.

Ultimately, this is configurable so that you can increase the timeout if necessary. We're going to leave the default as-is for the time being - that said, if others continue to run into this, we are open to revisiting the default.

Cheers!

johnmaguire avatar Apr 01 '24 18:04 johnmaguire