vector icon indicating copy to clipboard operation
vector copied to clipboard

Document `reverse_dns` remap function and add local caching

Open binarylogic opened this issue 3 years ago • 14 comments

Reverse DNS lookup for an IP address.

Example

.domain = reverse_dns("8.8.8.8")

Result:

{
	"domain": "dns.google.com"
}

binarylogic avatar Oct 12 '20 03:10 binarylogic

A closely related request mentioned on discord: doing a whois lookup.

JeanMertz avatar Dec 03 '20 10:12 JeanMertz

Noting, this issue is currently blocked due to decisions around where to draw the line with Remap functions (covered in #3740). See https://github.com/timberio/vector/pull/5647#issuecomment-750921635. We are hesitant to add functions that issue network calls on a per-event basis since this could severely hinder performance.

binarylogic avatar Jan 04 '21 19:01 binarylogic

@binarylogic I briefly looked into #5647. It would be fairly trivial to add "runtime state" to VRL. Basically, we'd initialize the runtime at boot, and then allow state to be tracked by individual functions across executions.

This would allow us to cache the DNS lookup calls (for the DNS TTL duration), which would significantly reduce the overhead.

We can defer this until after launch, but it seems that would mostly resolve the downside of allowing network calls in functions, at least in this specific case.

JeanMertz avatar Jan 04 '21 20:01 JeanMertz

Is there any news regarding this kind of functionality?

Secondly, would the current implementation allow you to specify the DNS server, or does it just take the one configured OS wide?

jerome-kleinen-kbc-be avatar May 28 '21 07:05 jerome-kleinen-kbc-be

Hi @jeromekleinen-kbc . Nothing yet, but it's still on our radar. This one is a bit more complex than other VRL functions given that it would require caching to provide acceptable throughput.

Secondly, would the current implementation allow you to specify the DNS server, or does it just take the one configured OS wide?

My instinct would just be to use the OS configured resolver, but would you find it useful to specify a set of DNS servers to use? We could make that an optional flag.

jszwedko avatar Jun 01 '21 18:06 jszwedko

@jszwedko my intent of specifying the DNS server is to set up a local DNS cache to avoid overloading the corporate DNS servers. I guess when there is native caching in vector this becomes less important. Perhaps one use case could be that depending on the hostname you might want to pick a different DNS server, either because of the domain or because of some zonality to reduce latency.

Native caching would be cool but I guess it comes with its own challenges, f.e. do you just follow the TTL or do you have let the cache expire with different timings, maximum size of the cache etc.

jerome-kleinen-kbc-be avatar Jun 02 '21 07:06 jerome-kleinen-kbc-be

@jszwedko my intent of specifying the DNS server is to set up a local DNS cache to avoid overloading the corporate DNS servers. I guess when there is native caching in vector this becomes less important. Perhaps one use case could be that depending on the hostname you might want to pick a different DNS server, either because of the domain or because of some zonality to reduce latency.

I agree, it makes sense to allow specifying optional custom DNS servers. We might not add it in the first iteration, but it wouldn't be fairly trivial to add as a follow-up.

Native caching would be cool but I guess it comes with its own challenges, f.e. do you just follow the TTL or do you have let the cache expire with different timings, maximum size of the cache etc.

The thinking right now would be:

  • in-memory cache (expires if Vector restarts)
  • using TTL of the DNS record, up to a hard-coded maximum
  • limited to a fixed number of cached DNS records, invalidating older records as newer ones are added

JeanMertz avatar Jun 02 '21 12:06 JeanMertz

The thinking right now would be:

  • in-memory cache (expires if Vector restarts)
  • using TTL of the DNS record, up to a hard-coded maximum
  • limited to a fixed number of cached DNS records, invalidating older records as newer ones are added

It would be cool if both the max TTL and the maximum cache size could be configurable, but I understand that this would add two additional global options.

Just for reference, this is what logstash offers: https://www.elastic.co/guide/en/logstash/current/plugins-filters-dns.html

jerome-kleinen-kbc-be avatar Jun 02 '21 13:06 jerome-kleinen-kbc-be

Requested in discord: https://discord.com/channels/742820443487993987/746070591097798688/870424592047550524

jszwedko avatar Jul 29 '21 22:07 jszwedko

v0.17.0 contains #8717 which implements this but without caching for now. That is already very useful to have, thanks!

ypid-geberit avatar Oct 11 '21 09:10 ypid-geberit

Thanks @ypid-geberit !

I'll leave this open until we publish it in the documentation (likely after we add caching).

jszwedko avatar Oct 12 '21 14:10 jszwedko

DNS Caching would be very useful for me and greatly improve performance.

coredump17 avatar Dec 19 '23 11:12 coredump17

Which version of DNS cache can be added?

jiaozi07 avatar Feb 01 '24 02:02 jiaozi07

No timeline yet, unfortunately.

jszwedko avatar Feb 01 '24 18:02 jszwedko

I suggested something similar (not having seen this ticket) in https://github.com/vectordotdev/vrl/issues/720 recently but would be happy to also see a local cache for performance gain. There needs to be the ability to more granularly specify lookups that are performed, which is why I wrote my feature request with a fairly long list of standard DNS options that might be supported. Just doing reverse lookups is not good enough, I think - there is no need to limit lookups to PTR records, and I see that as kind of a strange thing to implement as a standalone function in the first place.

I also believe the result back from a DNS lookup (even if cached) should be structured in a DNSTAP-style message so that it can be parsed the most flexible way possible for all QTYPEs, not just PTRs. If the function is written in such a way that it allows specification of things like timeouts and retries, this can be done in a way that is optimally performant without high risk of blocking, especially if VRL client stub timeout values can be changed from the standard "seconds" to "milliseconds". The local cache will still get a copy of the request and store it even if the first Vector event has moved on after a timeout. The next request after the forwarding DNS server is completed with its request will get a very rapid reply and will keep it until the TTL expires. So one or two requests may suffer, but after that point things will accelerate greatly (less than 0.3ms even across ~5-meter distant adjacent, 1-switch separated fairly busy physical servers for most answers in our case using UDP, and faster still if the caching resolver is on localhost - possibly an order of magnitude but I haven't tested.)

While I appreciate Vector not allowing "footguns", it also is the case that without this particular footgun I will have to write an entirely separate downstream pipeline for many of the objects going through Vector, which sort of invalidates much of the savings I am getting from consolidating our event event stream into processing by one codebase at the edge of our widely-dispersed network. In other words: I must have DNS lookups done on specific sets of objects before sending them along; it's not just a "nice to have." Lua can do this (I think?) with os.execute and I suppose I need to start looking at that, but I suspect/assert it's possible to do this much faster from within a native function.

johnhtodd avatar Mar 06 '24 23:03 johnhtodd