hnsd icon indicating copy to clipboard operation
hnsd copied to clipboard

dns: fix name parse/serialization and support escapes (big fix 1/4)

Open pinheadmz opened this issue 3 years ago • 5 comments

Refactored out of bloated https://github.com/handshake-org/hnsd/pull/76

Implement escaped characters when reading and writing names. This means we can lose the weird internal "special bytes" 0xff and 0xfe which were used to represent \0 and . respectively, internally, as a "hack" around C strings.

So:

\! -> !
\061 -> a
\000 -> 0x00
\\ -> \

...etc

I based this patch on the writeName and readName functions in bns encoding.js

I noticed that knot resolver uses is_punc() but JJ used a switch with a list of the special characters 🤷

Another quirk is that hnsd hsk_dns_name_serialize() serves two functions, depending on if char *name is passed, which is report size, or actually write the name. In the PR right now I have basically the same code block twice which seems smelly, could use some inspiration on how or if to optimize that.

pinheadmz avatar Dec 22 '21 14:12 pinheadmz

Unfortunately, this is not gonna work.

Internal structures that use char *name have specific lengths predefined, you can see all of those in src/dns.h and it is 256. That is already a limit of allowed name length (dns allowed 255 octets + 0x00). This means that any attempt to "extend" the data further will not fit in the worst case 255 octet. E.g. punctuation/special characters will need +1 bytes where non printable characters will take +3 bytes. So super worst case scenario (255 non-printable characters) will just take 4 times more space (1020).

Current solution (replacing 0x00 by 0xff and 0x2e('.') by 0xfe wont have these issues, even though it will not correctly serialize/deserialize any name that contains 0xff and 0xfe bytes in it.

Side notes:

  • hsk_dns_name_sanitize would have needed upgrade as well (as it depends on 0xff and 0xfe logic.
  • I was thinking if we want to add compressed message tests here well, but it's better to do it in a different PR.

knot: yeah, they use is_punct - their defined function, instead of ispunct from libc, I guess because they dont want to deal with locale stuff.

nodech avatar Dec 23 '21 05:12 nodech

Internal structures that use char *name have specific lengths predefined, you can see all of those in src/dns.h and it is 256. That is already a limit of allowed name length (dns allowed 255 octets + 0x00). This means that any attempt to "extend" the data further will not fit in the worst case 255 octet. E.g. punctuation/special characters will need +1 bytes where non printable characters will take +3 bytes. So super worst case scenario (255 non-printable characters) will just take 4 times more space (1020).

Most DNS libs don't store the name in presentation format (with literal \ddd as string) internally. Knot stores it as uncompressed, dynamically allocated, uint8_t *name in wire format (predefined 256 wastes space but eh). ldns does something similar. A null terminator is a valid DNS octet and we already know the length of every label ... we should avoid C strings. Also, wire format means no issues with zero octet or 0x2e ('.'). Ideally, that's how we should do it not necessarily in this PR. Long term this is a worthwhile refactor. Current serialization in hnsd is broken for no good reason.

The current predefined size of 256 in internal structures should still fit any valid DNS name containing any byte 000 to 255 as long as names are not stored in presentation format. Presentation format foo\ddd can be converted before use. However, we lose information about the length and can't use strlen without the current 0x00 -> 0xff hack, or length must be stored somewhere.

We could also avoid supporting presentation format entirely and use C escape sequences \nnn for any proofs that use non-printable chars internally (or even just a byte array). They're in octal so DNS name name = "hello\\255" (length = 9) would use name = "hello\377" (length is only 6). We don't need presentation format except for printing human readable names in log output? still needs the hack for null terminator without storing length.

Current solution (replacing 0x00 by 0xff and 0x2e('.') by 0xfe wont have these issues, even though it will not correctly serialize/deserialize any name that contains 0xff and 0xfe bytes in it.

Hm I thought the issue with0x00 -> 0xff was only about incorrectly using C string functions for DNS names. This is unrelated to names in presentation format not fitting in the internal structures correct?

buffrr avatar Dec 23 '21 11:12 buffrr

Most DNS libs don't store the name in presentation format (with literal \ddd as string) internally. Knot stores it as uncompressed, dynamically allocated, uint8_t *name in wire format (predefined 256 wastes space but eh). ldns does something similar. A null terminator is a valid DNS octet and we already know the length of every label ... we should avoid C strings. Also, wire format means no issues with zero octet or 0x2e ('.'). Ideally, that's how we should do it not necessarily in this PR. Long term this is a worthwhile refactor. Current serialization in hnsd is broken for no good reason.

Yeah, it makes sense. Not using wire format internally only helps with some functions (makes them easier to read/write about ? e.g. compression), but I am not 100% sure why was this decision made.

The current predefined size of 256 in internal structures should still fit any valid DNS name containing any byte 000 to 255 as long as names are not stored in presentation format. Presentation format foo\ddd can be converted before use. However, we lose information about the length and can't use strlen without the current 0x00 -> 0xff hack, or length must be stored somewhere. We could also avoid supporting presentation format entirely and use C escape sequences \nnn for any proofs that use non-printable chars internally (or even just a byte array). They're in octal so DNS name name = "hello\255" (length = 9) would use name = "hello\377" (length is only 6). We don't need presentation format except for printing human readable names in log output? still needs the hack for null terminator without storing length.

I was wondering if we could just use first byte for size. It can store 0-255 and that's enough to measure rest of 255 bytes -- 256 bytes size of the structs wont change. With custom typedef and custom strlen usage will be the same. Unfortunately, I am not that familiar with the usage of these structs and can't say this would work. Maybe just plan for the refactor already.

Hm I thought the issue with0x00 -> 0xff was only about incorrectly using C string functions for DNS names. This is unrelated to names in presentation format not fitting in the internal structures correct?

Yeah, if original content of the wire contains 0xff and/or 0xfe on parse nothing will happen, but on serialize they will be converted to 0x00 and . incorrectly.

nodech avatar Dec 23 '21 12:12 nodech

I see what you guys are saying, there's no need to use presentation format internally except maybe the log... and we should check the compression functions but I don't think there's a need for strings there either. The best solution to this is stick to wire format internally and use byte arrays instead of strings. DNS wire format has a length byte anyway so we should just stick to that convention.

pinheadmz avatar Dec 23 '21 18:12 pinheadmz

Taking another look at this and getting a bit worried -- char *name is ubiquitous in this codebase. Everything expects names to be strings: the cache, the readers and writers, all the utilities that count labels and check for _synth and decode each record type and functions that look for '.' in strings...

I wonder what the performance cost is to just change the 256's in dns.h (in the typedefs for all the record types) to 1021... It's not like these strings stay in memory that long anyway. What really sucks up our resources in hnsd right now is 100,000 236-byte chain headers

diff --git a/src/dns.h b/src/dns.h
index 9716d15..bf241b4 100644
--- a/src/dns.h
+++ b/src/dns.h
@@ -6,8 +6,10 @@
 #include <stdlib.h>
 #include "map.h"
 
+#define HSK_DNS_MAX_STRING (255 * 4) + 1
+
 typedef struct hsk_dns_rr_s {
-  char name[256];
+  char name[HSK_DNS_MAX_STRING];
   uint16_t type;
   uint16_t class;
   uint32_t ttl;
@@ -57,7 +59,7 @@ typedef struct {
 } hsk_dns_unknown_rd_t;
 
 typedef struct {
-  char ns[256];
+  char ns[HSK_DNS_MAX_STRING];
   char mbox[256];
   uint32_t serial;
   uint32_t refresh;
@@ -85,15 +87,15 @@ typedef struct {
 } hsk_dns_loc_rd_t;
 
 typedef struct {
-  char target[256];
+  char target[HSK_DNS_MAX_STRING];
 } hsk_dns_cname_rd_t;
 
 typedef struct {
-  char target[256];
+  char target[HSK_DNS_MAX_STRING];
 } hsk_dns_dname_rd_t;
 
 typedef struct {
-  char ns[256];
+  char ns[HSK_DNS_MAX_STRING];
 } hsk_dns_ns_rd_t;
 
 typedef struct {
@@ -167,7 +169,7 @@ typedef struct {
   uint32_t expiration;
   uint32_t inception;
   uint16_t key_tag;
-  char signer_name[256];
+  char signer_name[HSK_DNS_MAX_STRING];
   size_t signature_len;
   uint8_t *signature;
 } hsk_dns_rrsig_rd_t;
@@ -185,7 +187,7 @@ typedef struct {
 } hsk_dns_rp_rd_t;
 
 typedef struct {
-  char next_domain[256];
+  char next_domain[HSK_DNS_MAX_STRING];
   size_t type_map_len;
   uint8_t *type_map;
 } hsk_dns_nsec_rd_t;

pinheadmz avatar Jan 11 '22 01:01 pinheadmz