gsqlbackend: add optimized getAuth
Short description
Cuts SOA queries for 1.2.3.4.5.6.7.8.9.example.com down from 10 to 1.
Checklist
I have:
- [x] read the CONTRIBUTING.md document
- [x] compiled this code
- [x] tested this code
- [x] included documentation (including possible behaviour changes)
- [ ] documented the code
- [ ] added or modified regression test(s)
- [ ] added or modified unit test(s)
That LIKE looks expensive, is it worth it?
That LIKE looks expensive, is it worth it?
Thats a really good question, and I think the answer might be "depends on your actual data and query patterns". It appears to be better than hitting the DB in a loop (latency, etc), but I cannot really put numbers to this.
This is not the approach we're looking for, as it is too expensive on the database.
Do we have actual numbers on how expensive it is?
If you have a million domains, the LIKE has to do a million compares. @mind04 informally shared some terrible numbers with me but I don't have them right now.
I think "LIKE" is indeed the wrong approach, although LIKE can be speed up massively by using proper indexes (we do that in postgres with a trigram(reverse(name)) index and also reversing the qname in the query) - so no million compares.
Anyways, if it should be fast, I think the getSOA logic should be done in a function/stored-procedure: Search the domain, if not found, cut of first label and try again.
On PG it wasn't -that- bad, and you can stick a funny index in ;)
Right, I didn't think about the reversed indexes. I'm absolutely open to some variant of this that also works for million zone setups, perhaps limited to certain backends, with config options, etc.
Anyways, if it should be fast, I think the getSOA logic should be done in a function/stored-procedure: Search the domain, if not found, cut of first label and try again.
For PG it is possible to use ANY() with an array of strings filled with the all possible zones: SELECT content, ttl, prio, type, domain_id, disabled::int, name, auth::int FROM records WHERE disabled = false and type = 'SOA' AND name = ANY($1::text[]). I have a PoC somewhere, but I couldn't find a way to do that with other SQL engines.
Yes, that's perfect, reduces roundtrips (but not necessarily 'database operations'). For some SQL engines I suspect it would involve a temporary table...
By the way, cutting off the first label in SQL is tricky, since a label can contain dots..
By the way, cutting off the first label in SQL is tricky, since a label can contain dots..
Yes, I'd expect the list of names to come from pdns.
I think all SQL servers can do something similar, but the syntax isn't very universal :(
The general idea would be to construct a "table" that exists during this statement only; usually that would envolve SQLs VALUES keyword. PG has ANY/IN and unnest() as shortcuts, not sure which bits of those are in the standard or widely supported.
ANY was not supported by MySQL last time I checked. What is neat about it is that it allows passing an array of values to a prepared statement, while all other options I could find seem to allow some kind of injection. But my SQL is rusty so I might have missed a better option :)
I closed this based on 'LIKE %foo' is slow, which is not a universal truth. Now that discussion is ongoing, let's reopen this at least.
ANYwas not supported by MySQL last time I checked.
Apparently it's now in 8.0.x!
All the interesting sounding functionality is new in 8.0:
- https://dev.mysql.com/doc/refman/8.0/en/any-in-some-subqueries.html
- https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html
- https://dev.mysql.com/doc/refman/8.0/en/values.html
I'm somewhat disappointed by this discovery.
By the way, cutting off the first label in SQL is tricky, since a label can contain dots..
How does this work? Do you have an example?
How does this work? Do you have an example?
www.example.com is a 3 label name. www.www.example.com is a 4 label name. www\.www.example.com is a 3 label name, and the first label is www.www
Ah I see - so the logic for the DB would be: do not split at . but at .
Ah I see - so the logic for the DB would be: do not split at . but at .
You fell into the escaping trap while typing your comment!
And the DB would also not split at \046, which is yet another way of writing a non-splitting dot. So I think it would make more sense if the splitting happened on the PowerDNS end. To keep things short, we could send a label list instead of a list of all truncation variants, but that's an implementation detail and a label list might be a lot harder to work with in SQL..
I think this can be closed, as zone-cache-refresh-interval is even more efficient.