Rec: serve stale
Short description
Showing this so others can take a look and play with it. General description in recursor_cache.cc.
Missing: docs, proper handling of caches records with ecs (they now expire too soon)I have to think how this can be tested in regression tests, basic unit test are there.
I have been testing this by blackholing outging dns requests using my firewall.
Enable by setting serve-stale-extensions, unit is 30s. So setting it to 100 will serve a stale record for 3000 seconds after it became stale.
Checklist
I have:
- [X] read the CONTRIBUTING.md document
- [X] compiled this code
- [X] tested this code
- [ ] included documentation (including possible behaviour changes)
- [X] documented the code
- [ ] added or modified regression test(s)
- [X] added or modified unit test(s)
So, when serve-stale is enabled and we fail to get a response, we do increase the TTD by up to 1440s, and during that time we will only try to get a new value once, as a background task, possibly very quickly after the first failed attempt? I'm a bit worried about using a stale record long after the auth has recovered, I wonder if we could retry more often?
The extra time per extension is 30s, or lower if the original TTL was lower. The max number of extensions is 1440. So I don't think your worry is warranted. There should be a retry scheduled every 30s, if the record keeps getting queried. If that is not happening there is a bug....
OK, then I did not understand it correctly! I think you should discard my comments until we are sure I'm looking at the correct version of recursor_cache.cc, otherwise you might be wasting your time.
Right. Will rebase soon as well.
recursor_cache.cc botch fixed and rebased. I plan to continue to work on this the coming time. Will be doing more tests and plus review of the approach and code.
Rebased to master to solve conflict