pdns icon indicating copy to clipboard operation
pdns copied to clipboard

gsqlbackend: add optimized getAuth

Open zeha opened this issue 5 years ago • 21 comments

Short description

Cuts SOA queries for 1.2.3.4.5.6.7.8.9.example.com down from 10 to 1.

Checklist

I have:

  • [x] read the CONTRIBUTING.md document
  • [x] compiled this code
  • [x] tested this code
  • [x] included documentation (including possible behaviour changes)
  • [ ] documented the code
  • [ ] added or modified regression test(s)
  • [ ] added or modified unit test(s)

zeha avatar Jul 09 '20 20:07 zeha

That LIKE looks expensive, is it worth it?

Habbie avatar Aug 25 '20 21:08 Habbie

That LIKE looks expensive, is it worth it?

Thats a really good question, and I think the answer might be "depends on your actual data and query patterns". It appears to be better than hitting the DB in a loop (latency, etc), but I cannot really put numbers to this.

zeha avatar Sep 08 '20 21:09 zeha

This is not the approach we're looking for, as it is too expensive on the database.

pieterlexis avatar Feb 04 '21 12:02 pieterlexis

Do we have actual numbers on how expensive it is?

rgacogne avatar Feb 04 '21 12:02 rgacogne

If you have a million domains, the LIKE has to do a million compares. @mind04 informally shared some terrible numbers with me but I don't have them right now.

Habbie avatar Feb 04 '21 12:02 Habbie

I think "LIKE" is indeed the wrong approach, although LIKE can be speed up massively by using proper indexes (we do that in postgres with a trigram(reverse(name)) index and also reversing the qname in the query) - so no million compares.

Anyways, if it should be fast, I think the getSOA logic should be done in a function/stored-procedure: Search the domain, if not found, cut of first label and try again.

klaus-nicat avatar Feb 08 '21 21:02 klaus-nicat

On PG it wasn't -that- bad, and you can stick a funny index in ;)

zeha avatar Feb 08 '21 21:02 zeha

Right, I didn't think about the reversed indexes. I'm absolutely open to some variant of this that also works for million zone setups, perhaps limited to certain backends, with config options, etc.

Habbie avatar Feb 08 '21 21:02 Habbie

Anyways, if it should be fast, I think the getSOA logic should be done in a function/stored-procedure: Search the domain, if not found, cut of first label and try again.

For PG it is possible to use ANY() with an array of strings filled with the all possible zones: SELECT content, ttl, prio, type, domain_id, disabled::int, name, auth::int FROM records WHERE disabled = false and type = 'SOA' AND name = ANY($1::text[]). I have a PoC somewhere, but I couldn't find a way to do that with other SQL engines.

rgacogne avatar Feb 09 '21 09:02 rgacogne

Yes, that's perfect, reduces roundtrips (but not necessarily 'database operations'). For some SQL engines I suspect it would involve a temporary table...

Habbie avatar Feb 09 '21 09:02 Habbie

By the way, cutting off the first label in SQL is tricky, since a label can contain dots..

rgacogne avatar Feb 09 '21 09:02 rgacogne

By the way, cutting off the first label in SQL is tricky, since a label can contain dots..

Yes, I'd expect the list of names to come from pdns.

Habbie avatar Feb 09 '21 09:02 Habbie

I think all SQL servers can do something similar, but the syntax isn't very universal :(

The general idea would be to construct a "table" that exists during this statement only; usually that would envolve SQLs VALUES keyword. PG has ANY/IN and unnest() as shortcuts, not sure which bits of those are in the standard or widely supported.

zeha avatar Feb 09 '21 10:02 zeha

ANY was not supported by MySQL last time I checked. What is neat about it is that it allows passing an array of values to a prepared statement, while all other options I could find seem to allow some kind of injection. But my SQL is rusty so I might have missed a better option :)

rgacogne avatar Feb 09 '21 10:02 rgacogne

I closed this based on 'LIKE %foo' is slow, which is not a universal truth. Now that discussion is ongoing, let's reopen this at least.

Habbie avatar Feb 09 '21 10:02 Habbie

ANY was not supported by MySQL last time I checked.

Apparently it's now in 8.0.x!

All the interesting sounding functionality is new in 8.0:

  • https://dev.mysql.com/doc/refman/8.0/en/any-in-some-subqueries.html
  • https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html
  • https://dev.mysql.com/doc/refman/8.0/en/values.html

I'm somewhat disappointed by this discovery.

zeha avatar Feb 09 '21 10:02 zeha

By the way, cutting off the first label in SQL is tricky, since a label can contain dots..

How does this work? Do you have an example?

klaus-nicat avatar Feb 09 '21 11:02 klaus-nicat

How does this work? Do you have an example?

www.example.com is a 3 label name. www.www.example.com is a 4 label name. www\.www.example.com is a 3 label name, and the first label is www.www

Habbie avatar Feb 09 '21 11:02 Habbie

Ah I see - so the logic for the DB would be: do not split at . but at .

klaus-nicat avatar Feb 09 '21 11:02 klaus-nicat

Ah I see - so the logic for the DB would be: do not split at . but at .

You fell into the escaping trap while typing your comment!

And the DB would also not split at \046, which is yet another way of writing a non-splitting dot. So I think it would make more sense if the splitting happened on the PowerDNS end. To keep things short, we could send a label list instead of a list of all truncation variants, but that's an implementation detail and a label list might be a lot harder to work with in SQL..

Habbie avatar Feb 09 '21 11:02 Habbie

I think this can be closed, as zone-cache-refresh-interval is even more efficient.

klaus-nicat avatar Sep 23 '24 20:09 klaus-nicat