hsd
hsd copied to clipboard
SOA Serial does not reflect the version of the data being served
This is effectively a duplicate, but broader description, of #559
In DNS, the purpose of the SOA Serial is to tell the clients the version of the data currently being served.
This is NOT being fulfilled by simply serving out the current date & time as it fails to take into account when a server is out-of-date & is still catching up, e.g. due to maintenance downtime or connectivity issues.
This causes a problem when running two instances of hsd
(for failover) in conjunction with Buffrr's AXFR plug-in to feed the merged ROOT zone to one or more slaves. If one hsd
server is taken down for a day or two, then brought back up - it will immediately lie that it has the latest data, when in fact it is still catching up.
This can cause the data on a downstream slave to be rolled back to an earlier version & the slave will then not be updated until the clock marches forward.
I've pointed this out varios devs at various times, but so far it's not fixed (v3.0.0)
The timestamp on the last block that was included in the most recent urkel tree update seems a reasonable choice to me, or this timestamp could be converting into YYYYMMSSXX
format, should you prefer, but many TLDs use unixtime as the SOA Serial these days.
Using any information that is always increasing, from the last block that was included in the most recent urkel tree update, will ensure that only when two servers are serving the same version of information will they return the same SOA Serial.
@buffrr and I have discussed this a lot as well but the issue is how to handle re-orgs (more on that in a sec). Your point about getting SOA from full nodes that have not yet finished syncing however seems far more important than what we were thinking about.
The reorg issue is: Imagine block 36 is found, the Urkel tree and the HNS root zone will officially be updated. We can use the height 36
as the serial or the timestamp of that block in YYYYMMDDHH
. A few minutes later, a chain reorganization occurs, changing the content of the Urkel tree and HNS root zone and creating an alternate block at the same height, maybe with the same timestamp, maybe even with an EARLIER timestamp (this is valid blockchain shit). Downstream DNS clients that already have the first copy of the root zone now have invalid data, inconsistent with other HNS resolvers, and will not be fixed until the next Urkel tree update in about 6 hours when, hopefully, a reorg does not occur and everyone is back on the same page.
One thing we could do is just always use the current chain tip height as the serial, not just the height of the last tree update.
pro: reorgs handled automatically pro: downstream clients can tell a hsd node is not synced yet pro: all synced nodes are always in sync with same SOA serial con: serial will change every ten minutes even though root zone data probably hasn't changed
Another thing we can use instead of chain height is the Median Time Past which is the median time of the last 11 block timestamps. It is guaranteed to always increase unlike the individual block timestamps themselves. Again, we would have to update this every block to ensure that reorgs are properly handled.
So,
Whats the best tradeoff? Does the axfr bridge repeatedly poll SOA serial and only transfer when its been updated? That would require more clever logic or else youre going to be downloading the same root zone every ten minutes, probably.
Or,
is it "okay" to have invalid data for 6 hours and then just hope the next tree update goes smoothly?
I've pointed this out varios devs at various times, but so far it's not fixed (v3.0.0)
Sorry about this, we are under-staffed and the developers that are working on HNS core software have higher priorities that they feel affect more users. This is why writing a PR is often more effective than pointing things out or opening issues.
his causes a problem when running two instances of hsd (for failover) in conjunction with Buffrr's AXFR plug-in to feed the merged ROOT zone to one or more slaves. If one hsd server is taken down for a day or two, then brought back up - it will immediately lie that it has the latest data, when in fact it is still catching up.
I have an open issue about this here trying to solve this in the plugin. The idea is to wait 12 blocks before serving the zone so that we can have a semi-globally consistent SOA serial across multiple hsd instances serving the exact same zone (assuming no reorgs larger than 12 blocks i think)
is it "okay" to have invalid data for 6 hours and then just hope the next tree update goes smoothly?
Ideally, hsd shouldn't really serve any zone data before it's fully synced. Simply restarting hsd will cause all kinds of unexpected issues and break lots of sites because it serves stale data that recursive resolvers will cache for a couple of hours. In the worst case, it will serve compromised DS keys that site owners have updated, but users will still be vulnerable because they will get the old key.
Yes ok I like this a lot, and we were already discussing the disparity between hsd and hnsd -- a good solution is for hsd to ALSO use "safe height for resolution" (12 blocks) and then use the timestamp from the tree interval block as the SOA serial. There will be some confused users as we deploy this but it does seem like that covers everything.
we can also use chain.maybeSync()
to determine if it is safe to resolve names. The worst case scenario is that this returns true
when the node is still 12 hours behind:
https://github.com/handshake-org/hsd/blob/df997a4c305eb8e730fefbaf12bf5bc041293aa3/lib/blockchain/chain.js#L2877-L2894
https://github.com/handshake-org/hsd/blob/df997a4c305eb8e730fefbaf12bf5bc041293aa3/lib/protocol/networks.js#L400-L405
In the worst case, it will serve compromised DS keys that site owners have updated, but users will still be vulnerable because they will get the old key.
I would hope, in that situation the stale DS
would fail to validate & all the data should be discarded
con: serial will change every ten minutes even though root zone data probably hasn't changed
TBH, for me that would rule that out - IMHO that's terrible
Ideally, hsd shouldn't really serve any zone data before it's fully synced
Yeah - that's fair - if you can do that, it would be the best
on a (non-blockchain) high volume DNS server I wrote in the past, that used a push journal stream update, you could keep journal blocks coming in while the DNS server was down & it would rush-process them (at start-up) to catch up before serving any data. It would also ask the upstream the latest journal serial number & wait until it had at least reached that serial before serving data - it meant there could be a small number of journal blocks that hadn't been processed before it starts serving, but it would only be a few & they'd get processed very quickly after serving started.
Have to say, it did slightly disturb me that a new install of hsd
would take about 6 to 8 hrs to sync up, but was more than happy to start serving data right away.
If hsd
never served any data until it was fully sync'd then the SOA Serial becomes less relevant - cos it would only ever serve latest data - no response would be fine by me - but even then, only changing the SOA Serial when the underlying data has change would be nice - right now it changes every hour, even though data may, or may not, have changed - but that's not a big deal at all cos the first thing I do is an AXFR->IXFR conversion, so it will detect the SOA Serial is the only thing that's changed.
This is why writing a PR is often more effective
oh, sure, but as you can see from the detail of the discussion, there's no way I'd ever come up with anything suitable, just not enough background knowledge!
the median time of the last 11 block timestamps
so long as there's near 100% chance of it always increasing - seems fine to me
Does the axfr bridge repeatedly poll SOA serial and only transfer when its been updated?
Not sure about the bridge, but this is exactly what the slave will be doing - SOA polling over UDP looking for an increased Serial
There will be some confused users as we deploy this
Any DNS s/w should be able to automatically cope with the serial number dropping from YYYYMMDDXX
to unixtime - there's a bunch of rules about it. Serial numbers are allowed to roll-round.
Have to say, it did slightly disturb me that a new install of
hsd
would take about 6 to 8 hrs to sync up, but was more than happy to start serving data right away.
I assume this was installed on HDD, syncing up on an SSD shouldn't take longer than 3 hours.
I would hope, in that situation the stale DS would fail to validate & all the data should be discarded
If an attacker was in the middle it should have no problem giving DNS answers signed by the old key (acting as the TLD's authoritative server). Of course, i'm talking about the worst case here, and this only works on not yet synced hsd nodes.
we can also use chain.maybeSync() to determine if it is safe to resolve names. The worst case scenario is that this returns true when the node is still 12 hours behind:
Any chance of having a similar function to maybeSync()
but with a stricter criteria for resolving that checks for a smaller than maxTipAge
like 6 hours or even 2 hours? Something like
isReady() {
if (!this.synced)
return false;
return this.tip.time >= this.network.now() - 2 * 60 * 60;
}
Hmm is there a case where this could return false forever or take way too long?
Block timestamps are only required to be greater than MTP (Median Time Past) which is the median of the last 11 blocks timestamps and usually ends up being about an hour behind actual UTC time. They are also required to be no greater than two hours ahead of actual UTC time.
Sometimes (not often but) blocks can take over an hour to find by the miners. So there is a case where an in-fact-fully-synced node will stop resolving because the chain tip timestamp is < 2 hours ago.
I think 12 hours is probably ok for this, but we can still compromise on 6 hours which makes sense anyway since thats the tree interval on mainnet.
Even if we started resolving 24-hour-old data the worst case is that a key is trusted that was revoked less than one day ago. Question for you DNS experts: how long do you normally expect a DNSSEC update to propagate through DNS anyway?
I assume this was installed on HDD, syncing up on an SSD shouldn't take longer than 3 hours.
yeah, HDD, but RAID - so not the worst case scenario - also 3 hrs of giving out incorrect NXDOMAIN
answers isn't too great
And as time goes on, this will only get longer & longer - currently participation in this project is relatively low - for example, ICANN ROOT servers get terabytes of queries each, every day.
its a real shame the DNS data couldn't be separated from all the auction & $HNS transaction data - but I can see splitting off where proof-of-ownership actually occurs is tricky without all the supporting evidence.
it should have no problem giving DNS answers signed by the old key
if the scenario is you changed the DS
cos the private keys were compromised, yeah I guess - lag is always a bitch
Even if we started resolving 24-hour-old data the worst case is that a key is trusted that was revoked less than one day ago
Or you give out NXDOMAIN
for a TLD that really does exist
But if you change a DS
or NS
(assuming its not an emergency), it would be reasonable to expect the old values to continue to work for at least 24 hrs, but probably more like at least a week.
how long do you normally expect a DNSSEC update to propagate through DNS anyway?
If the zone changes their DNSKEY
& DS
(and all their RRSIGs) right away, any new RRSIGs will fail to validate (against the old key data) & ALL the associated data should be discarded from cache - so it should be pretty quick. But exactly what is discarded is very implementation dependent - so it can take longer, DNSSEC data & successful validations are often kept, cos they can take a while to complete
CloudFlare do it correctly & will flip almost immediately - most others wait for the TTL on validated keys to expire before dropping them. Cos 8.8.8.8
is a cluster of many servers, this means the old values are dropped gradually over a few hrs.
Obviously it also depends on DS
change propagation time in the parent zone - in most ICANN TLDs customers now expect this to be live - i.e. within a few seconds (def <1min) of posting the change with the registrar.
Even if we started resolving 24-hour-old data the worst case is that a key is trusted that was revoked less than one day ago.
Yeah, even 24-hour is an improvement. If we can make it 6 hours, that's even better (assuming no issues)
how long do you normally expect a DNSSEC update to propagate through DNS anyway?
If the zone changes their DNSKEY & DS (and all their RRSIGs) right away, any new RRSIGs will fail to validate (against the old key data)
Yeah, a proper key roll over should be performed. If proofofconcept
is popular site with lots of users. It would be a bad idea to just replace the DS in the parent. This will cause an outage.
For example, you have this DS record in the root zone:
proofofconcept. 21600 IN DS 5362 8 2 <some digest>
To "roll" the DS, you should first add a new one (while still keeping the old DS).
proofofconcept. 21600 IN DS 5362 8 2 <some digest>
proofofconcept. 21600 IN DS 55367 8 2 <some digest>
Resolvers may still have the old DS RRSet cached for proofofconcept
they don't know yet about the key with tag 55367
. So you should keep your zone signed with the old key.
For a safe DS rollover:
- Add the new DS to the root zone (while still keeping the old one).
- Wait for the Handshake root zone to serve the new DS RRSet.
- Then, wait for the old DS RRSet to expire from resolvers cache (depends on TTL for hsd its 6 hours iirc)
- Also, change your DNSKEY RRSet. Add the new DNSKEY(s) to your authoritative server (still signed by the old key). Keep the old DNSKEYs too!
- Wait for the TTL of your old DNSKEY RRSet to expire. Resolvers should see your updated DNSKEY RRSet.
- Now you can start signing with the new key.
- Remove the old DS with tag
5362
from the root zone it's no longer used and remove the old DNSKEYs.
Rolling a DS safely requires two updates to the root zone. Alternatively, you can always have an emergency standby DS added that you keep secure somewhere. If the active DS/DNSKEYs are compromised, you can just remove them and start using the new ones immediately. This requires one update to remove the old DS.
Of course, this area is still improving so some better techniques may come up.
For a safe DS rollover:
I think adding the new keys before adding the new DS
is more reliable.
Adding the new DS
first doesn't universally work (in ICANN zones). I've tried it before. It works with bind
, CloudFlare & Google, but not some proprietary implementations, like Zscaler (there may be others) - TBH its a fecking nightmare and the RFCs do contradict each other at times.
I was trying to move from one DNSSEC signing provider to another (for a large client). In the end the conclusion was that the only way to do it was to go unsigned for a while!! ... although I think I've got a plan that would work now
IMHO best thing is to do is add the new keys, then add the new DS
, wait then remove the old DS
, then remove the old keys.
One RFC says so long as there is any path to validate any RRSIG
then you should accept the data, another says there MUST be a validation path for EVERY DS
present.
The powers that be™ are aware of this contradiction and plan to issue a clarification.
If you read the official methodology for changing external signing provider, you'll discover that there isn't a single piece of DNS s/w that supports it!
Changing KSK algorithm is also a nightmare.
I can totally see why a lot of well known sites just don't use DNSSEC - there's little tangible advantage (advantages an MBA could measure), but there are all sorts of nasty corner cases that can bring your site down. PowerDNS does a good job of making it a lot easier to implement.
I think adding the new keys before adding the new DS is more reliable.
Having an additional DS without a corresponding DNSKEY is okay and this was mentioned from the very early DNSSEC RFCs but it may get tricky when changing algorithms. I like the DNSKEY first more actually because it allows your new DNSKEY(s) to propagate in resolvers cache while still waiting for Handshake root zone to update. So you can do both at the same time actually especially if your DNSKEYs will propagate faster (depends on TTL). RFC-7583 is dedicated to this and explains drawbacks of different techniques but doesn't cover algorithm changes.
One RFC says so long as there is any path to validate any RRSIG then you should accept the data
Yup that should be the case.
another says there MUST be a validation path for EVERY DS present
This may be tricky when considering message digest algorithm and DNSKEY algorithm downgrades. I can see why this is useful. If two trust anchors are present, one with a stronger algorithm and one weaker, a validating resolver may want to favor the stronger. Mainstream resolvers don't do this though because they have to accept any valid path.
I can totally see why a lot of well known sites just don't use DNSSEC - there's little tangible advantage
There's a small advantage to securing A/AAAA records. WebPKI threat model doesn't rely on DNSSEC. DANE is the killer app and what makes it worth it.
TBH its a fecking nightmare and the RFCs do contradict each other at times
Yeah there's confusion there and some resolvers try to interpret the RFCs more strictly. I think what makes DNSSEC hard is having to think about those TTLs and the global cache. Validating resolvers should perhaps try to be more lenient and request new keys when validation fails although this could increase load or introduce new forms of denial of service since any bogus answer would trigger multiple queries.
I like the DNSKEY first more actually because it allows your new DNSKEY(s) to propagate in resolvers cache
Right - with PowerDNS you can also propagate keys "inactive" first, which is nice - its their recommended method for ZSK Rollover
Someone should write a plugin for hsd to automate rollovers perhaps by querying CDS/CDNSKEY records ;) Since we can easily update the root zone, parent/child communication should really be automated.
Ok so if I can boil this discussion down to a set of code changes we all agree on, I'll open a PR:
-
Like SPV node, Full Node should wait 12 confirmations after each Urkel Tree update before resolving records from the updated root zone. (see
getSafeRoot()
inchain.js
) -
The SOA serial should be the timestamp in the first block header after each tree update (ie. the first block header to commit to the updated tree root hash, which according to (1) was at least 11 blocks ago)
-
The hsd (and hnsd) root server should send
REFUSED
(or is there something better ?) until the chain is "synced" which means the timestamp in the chain tip (most recent block) is within the last 6 hours. This is a different definition of "synced" than is used elsewhere in the code, that's OK.
This will:
- prevent old records from being served while a node is still syncing
- guarantee that SOA serial is always increasing, and only changes when the root zone actually changes
- even if there is a chain reorg, but only if that reorg is < 12 blocks deep (which is like, we got bigger problems then)
- synchronize the responses from full and light nodes
- HNS users will have to get used to an extra 2-hour wait when updating records on chain 😬
The hsd (and hnsd) root server should send REFUSED (or is there something better ?)
SERVFAIL
(rcode=2) would be what I'd expect to see
2 Server failure - The name server was
unable to process this query due to a
problem with the name server.
sounds like a typical techie comment :laughing:
Sounds good! We can use extended DNS errors RFC8914 edns option to indicate that the chain is syncing just to make it easier to differentiate other types of SERVFAIL
We can use extended DNS errors RFC8914 edns option
Sounds like a fine plan, you would need to check the client asked using EDNS, of course
Generally, SERVFAIL
is actually remarkably rare, for authoritative servers - generally I've only seen it when bind
is still loading the zone data - so pretty much the exact same scenario. Also, the standard behavior is to just try the next NS
- this comment is very typical, IMHO
Fortunately, most domains use multiple authoritative DNS servers, so if there is a short-lived ServFail issue
on one name server which doesn’t impact the others, DNS lookups should still work. That said, if a name
server has chronic ServFail issues, we recommend investigating why. ServFail errors happen, but should be rare.
SERVFAIL is actually remarkably rare, for authoritative servers
yup REFUSED is usually used if they don't want/not ready to answer (it really just depends on preference). For example, Knot DNS (an authoritative) will give a REFUSED answer if it's not ready to respond to an AXFR request (if transfers are temporarily paused). The REFUSED answer uses Extended DNS Errors with EDE error code 14 (Not Ready):
4.15. Extended DNS Error Code 14 - Not Ready
The server is unable to answer the query, as it was not fully functional when the query was received.
yup REFUSED is usually used
yeah, sounds better - it's a fine line RFC1035 says REFUSED
is to be used for policy
reasons - so it's mostly seen for permissions
issues (auth servers that refuse to answer RD=1
, resolvers with ACLs for who can use them, etc), but you could easily make a case that not serving out-of-data data is a policy
decision -> name server may not wish to perform a particular operation
does fit.
5 Refused - The name server refuses to
perform the specified operation for
policy reasons. For example, a name
server may not wish to provide the
information to the particular requester,
or a name server may not wish to perform
a particular operation (e.g., zone
transfer) for particular data.