dns
dns copied to clipboard
Parsing TXT records does not properly de-escape quotes
When calling NewRR() on a TXT record, an escaped quote isn't properly de-escaped.
A line such as:
example.com. IN TXT "inner\"quote"
It gets parsed to: inner\"quote
I was expecting: inner"quote
Maybe I'm misreading the RFC? It just seems that without removing the \
there's no way to know what the \
means in the resulting string.
I've created https://github.com/miekg/dns/pull/1383 to demonstrate the issue.
[ Quoting @.***> in "[miekg/dns] Parsing TXT records doe..." ]
When calling NewRR() on a TXT record, an escaped quote isn't properly de-escaped.
A line such as:
example.com. IN TXT "inner"quote"
It gets parsed to: inner"quote I was expecting: inner"quote
but if you put that back in the NewRR you lost the fact that it was quoted...
We've have multiple discussions about this throughout the years, mostly because the original RFC text sucks - I think the current behavior is correct
"you lost the fact that it was quoted" -- I would imagine that the fact that it is ASCII 34 means it was quoted.
Can the string be formed any other way?
trouble parsing what you mean here. I still believe current behavior is correct, as calling this should give the same output:
NewRR -> String() -> NewRR on that string -> String() etc.
Sorry for being unclear.
Question: Should newRR() simply remove the outer quotes (if they exist) or should it de-escape interior double-quotes too?
I expected it to do the latter. I wrote a TestTXTEscapeJustParse
which only does the first half of what TestTXTEscapeParsing
does. That way I can see the intermediate result.
It looks like NewRR() doesn't de-escape interior quotes. (i.e. replace \"
with "
).
- Input to NewRR:
"escaped\"quote"
- Expected output of result.String():
escaped"quote
- Actual output of result.String():
escaped\"quote
When we round-trip the strings by calling sprintTxt()
, we get the original input, because it ignores the \
.
If you look at https://github.com/miekg/dns/pull/1383/files, you'll see the a 2 tests, one should fail and the other should pass.
$ go test ./...
--- FAIL: TestIsPacketConn (0.00s)
client_test.go:74: unable to run test server: listen unixpacket /var/folders/j8/8pxbmmjs6xjbd4nxjpd_q0rm0000gp/T/TestIsPacketConn2401358967/002/unixpacket.sock: socket: protocol not supported
--- FAIL: TestTXTEscapeJustParse (0.00s)
parse_test.go:163: mismatch after parsing `"escaped\"quote"` TXT record: `escaped\"quote` != `escaped"quote`
FAIL
FAIL github.com/miekg/dns 0.874s
ok github.com/miekg/dns/dnsutil 0.130s
FAIL
If no de-escaping is intended, then there's no bug.
Tom
We've just run into a related issue. In this comment, X
represents a string containing the character a
253 times. Then:
- a TXT with zone file content
"X0123456789"
parses into sliceX01
+23456789
(as expected) - a TXT with zone file content
"X\;0123456789"
parses into sliceX\;
+0123456789
(this seems like a bug: the backslash and semicolon are each counted as a byte, and the string split earlier than necessary, even though this escape sequence would only produce a single byte in wire format)
The String()
function removes unnecessary backslashes, so parsing and then calling String()
on something like "X\;a" "test"
turns it into "X;" "a" "test"
, adding an unnecessary string. In my opinion, the way to go would be to remove unnecessary backslashes immediately upon parsing instead of in String()
. This seems more logical and consistent to me: a string in a parsed dns.TXT would then really only contain the contents of that string, without additional zone file syntax. AAAA addresses, for example, are also stored in canonical form (e.g., unnecessary explicit zeros are removed during parsing) so there should be no need to keep literal backslashes in TXT records, right?
(For simplicity, all strings in this comment are raw/literal strings, without Go syntax. In other words, every backslash above is a literal backslash and every quote is a literal quote, such as a quote in a zone file.)
Edit: I've opened #1540 to fix the immediate issue, though I still think it'd be smart not to store escape sequences in parsed data structures (just like JSON escape sequences in strings aren't preserved when using Unmarshal, etc.), even though that's technically a breaking change.