dns icon indicating copy to clipboard operation
dns copied to clipboard

Parsing TXT records does not properly de-escape quotes

Open TomOnTime opened this issue 2 years ago • 5 comments

When calling NewRR() on a TXT record, an escaped quote isn't properly de-escaped.

A line such as:

example.com. IN TXT "inner\"quote"

It gets parsed to: inner\"quote I was expecting: inner"quote

Maybe I'm misreading the RFC? It just seems that without removing the \ there's no way to know what the \ means in the resulting string.

I've created https://github.com/miekg/dns/pull/1383 to demonstrate the issue.

TomOnTime avatar Jun 20 '22 15:06 TomOnTime

[ Quoting @.***> in "[miekg/dns] Parsing TXT records doe..." ]

When calling NewRR() on a TXT record, an escaped quote isn't properly de-escaped.

A line such as:

example.com. IN TXT "inner"quote"

It gets parsed to: inner"quote I was expecting: inner"quote

but if you put that back in the NewRR you lost the fact that it was quoted...

We've have multiple discussions about this throughout the years, mostly because the original RFC text sucks - I think the current behavior is correct

miekg avatar Jun 20 '22 15:06 miekg

"you lost the fact that it was quoted" -- I would imagine that the fact that it is ASCII 34 means it was quoted.

Can the string be formed any other way?

tlimoncelli avatar Jun 20 '22 19:06 tlimoncelli

trouble parsing what you mean here. I still believe current behavior is correct, as calling this should give the same output:

NewRR -> String() -> NewRR on that string -> String() etc.

miekg avatar Jul 07 '22 10:07 miekg

Sorry for being unclear.

Question: Should newRR() simply remove the outer quotes (if they exist) or should it de-escape interior double-quotes too?

I expected it to do the latter. I wrote a TestTXTEscapeJustParse which only does the first half of what TestTXTEscapeParsing does. That way I can see the intermediate result.

It looks like NewRR() doesn't de-escape interior quotes. (i.e. replace \" with ").

  • Input to NewRR: "escaped\"quote"
  • Expected output of result.String(): escaped"quote
  • Actual output of result.String(): escaped\"quote

When we round-trip the strings by calling sprintTxt(), we get the original input, because it ignores the \.

If you look at https://github.com/miekg/dns/pull/1383/files, you'll see the a 2 tests, one should fail and the other should pass.

$ go test ./...
--- FAIL: TestIsPacketConn (0.00s)
    client_test.go:74: unable to run test server: listen unixpacket /var/folders/j8/8pxbmmjs6xjbd4nxjpd_q0rm0000gp/T/TestIsPacketConn2401358967/002/unixpacket.sock: socket: protocol not supported
--- FAIL: TestTXTEscapeJustParse (0.00s)
    parse_test.go:163: mismatch after parsing `"escaped\"quote"` TXT record: `escaped\"quote` != `escaped"quote`
FAIL
FAIL	github.com/miekg/dns	0.874s
ok  	github.com/miekg/dns/dnsutil	0.130s
FAIL

If no de-escaping is intended, then there's no bug.

Tom

TomOnTime avatar Jul 07 '22 17:07 TomOnTime

We've just run into a related issue. In this comment, X represents a string containing the character a 253 times. Then:

  • a TXT with zone file content "X0123456789" parses into slice X01 + 23456789 (as expected)
  • a TXT with zone file content "X\;0123456789" parses into slice X\; + 0123456789 (this seems like a bug: the backslash and semicolon are each counted as a byte, and the string split earlier than necessary, even though this escape sequence would only produce a single byte in wire format)

The String() function removes unnecessary backslashes, so parsing and then calling String() on something like "X\;a" "test" turns it into "X;" "a" "test", adding an unnecessary string. In my opinion, the way to go would be to remove unnecessary backslashes immediately upon parsing instead of in String(). This seems more logical and consistent to me: a string in a parsed dns.TXT would then really only contain the contents of that string, without additional zone file syntax. AAAA addresses, for example, are also stored in canonical form (e.g., unnecessary explicit zeros are removed during parsing) so there should be no need to keep literal backslashes in TXT records, right?

(For simplicity, all strings in this comment are raw/literal strings, without Go syntax. In other words, every backslash above is a literal backslash and every quote is a literal quote, such as a quote in a zone file.)


Edit: I've opened #1540 to fix the immediate issue, though I still think it'd be smart not to store escape sequences in parsed data structures (just like JSON escape sequences in strings aren't preserved when using Unmarshal, etc.), even though that's technically a breaking change.

janik-cloudflare avatar Feb 15 '24 12:02 janik-cloudflare