domain icon indicating copy to clipboard operation
domain copied to clipboard

Introduce a new `zonefile` module

Open bal-e opened this issue 9 months ago • 10 comments

This is a work-in-progress PR for overhauling domain::zonefile with a brand-new, out-of-place zonefile parser. It integrates a decent number of unit tests, but is currently lacking many record types (because new-base doesn't have them yet).

  • [x] Add an example program which uses new_zonefile for parsing.
  • [x] Test the implementation against large sample zonefiles.
  • [x] Benchmark this against the old implementation.
  • [x] Add support for more record data types (in this PR or new-base).

The new zonefile parser has a few basic differences from the existing implementation:

  • Records borrow from the parser object to avoid unnecessary allocations.
  • The actual parsing logic operates on strings instead of symbols.
  • Entry-parsing code is separated from per-entry parsing code.

The new zonefile parser has been designed to support parallelized zonefile parsing in the future. I am leaving this to a later PR.

bal-e avatar Mar 03 '25 13:03 bal-e

Migrated some DNSSEC related commits to new-base, and had to force-push to resolve some mistakes during the cherry-pick.

bal-e avatar Mar 07 '25 11:03 bal-e

While testing this PR I noticed that on parsing failure the reported error gives the line number of the problematic record but says things like "could not parse the owner name: irregular character in domain name" without saying which character was considered irregular, or "could not parse the record data: unrecognized record type" without saying which record type.

I appreciate that the reason for this is to be able to use &'static str as the error message type, but am generally in favour of giving as much context on error as possible.

For the irregular character case one cannot know which character is the problem without a trial and error / divide and conquer approach to change the record and reload, change the record and reload, which could be very frustrating.

For the unrecognized record type the line number should be enough to make this clear, but only if parsing a file as input. If the input were XFR and not written out prior to parsing the reader of the error would have no idea which record caused the problem.

ximon18 avatar Apr 07 '25 10:04 ximon18

@ximon18: I agree that the user experience here isn't the best. But the parsing system has a lot of inherent limitations with regards to error reporting. It can only report one error at a time, so a zonefile with multiple errors would still require many invocations to get working. It also provides very limited context: most parsing functions can't tell which part of a record they're being used in (e.g. RType can be parsed for the type of the record, but also for an NSEC type bitmap). Providing useful, contextual error messages is going to be hard.

Instead of making the parsing system more complicated, I think users are better served if we can show them the specification the parser is following. I've documented the zonefile spec (that I'm using) in this PR; perhaps we can print it or link to it from user-facing programs that perform zonefile parsing. For example, dnst read-zone could have a --spec option which prints out the specification.

With regards to the correctness of the specification (i.e. that the actual implementation follows the documented specification correctly): perhaps we can use fuzzing here. It seems that grammar-based fuzzers do exist, and we could set that up. I'd like to do that in a separate PR, though -- I suspect it'll get pretty complicated.

bal-e avatar Apr 07 '25 13:04 bal-e

@ximon18: I agree that the user experience here isn't the best. But the parsing system has a lot of inherent limitations with regards to error reporting. It can only report one error at a time, so a zonefile with multiple errors would still require many invocations to get working. It also provides very limited context: most parsing functions can't tell which part of a record they're being used in (e.g. RType can be parsed for the type of the record, but also for an NSEC type bitmap). Providing useful, contextual error messages is going to be hard.

Instead of making the parsing system more complicated, I think users are better served if we can show them the specification the parser is following. I've documented the zonefile spec (that I'm using) in this PR; perhaps we can print it or link to it from user-facing programs that perform zonefile parsing. For example, dnst read-zone could have a --spec option which prints out the specification.

With regards to the correctness of the specification (i.e. that the actual implementation follows the documented specification correctly): perhaps we can use fuzzing here. It seems that grammar-based fuzzers do exist, and we could set that up. I'd like to do that in a separate PR, though -- I suspect it'll get pretty complicated.

I'm concerned that referring operators to a spec document somewhere when there's an error due to something in the received zonefile bytes asking them to somehow work it out is going to be usable. In the previous parser implementation I added the missing context by passing out a String instead of &'static str, as the parser knew which value it just failed to parse. Is that not possible here?

ximon18 avatar Apr 07 '25 13:04 ximon18

I think I misunderstood the use case you're thinking about. I was considering this API from the perspective of a user manually executing a program that is loading a zonefile (which is the perspective of us as developers trying to test out the zonefile functionality). What information would we need to provide DNS operators (who are presumably working with a large number of zones, and have an automated system that is loading zonefiles) when an error occurs? What would their immediate response to such a failure be?

To answer the concrete question: it is possible to return a String for the error message here. It's just that the parser has very little context about where it's getting called from (e.g. whether the parser is located before or within the RDATA field).

bal-e avatar Apr 07 '25 14:04 bal-e

I think I misunderstood the use case you're thinking about. I was considering this API from the perspective of a user manually executing a program that is loading a zonefile (which is the perspective of us as developers trying to test out the zonefile functionality). What information would we need to provide DNS operators (who are presumably working with a large number of zones, and have an automated system that is loading zonefiles) when an error occurs? What would their immediate response to such a failure be?

To answer the concrete question: it is possible to return a String for the error message here. It's just that the parser has very little context about where it's getting called from (e.g. whether the parser is located before or within the RDATA field).

The use case is that a running server attempts to read a zone file from disk or records from an incoming XFR and for some reason is unable to parse the zone. The operator then needs to be able to know which record could not be parsed, which may require more than a line number and a generic error message not specific to the record in question. Am I missing something here that makes this wish seem strange?

ximon18 avatar Apr 07 '25 14:04 ximon18

I think it makes sense. A parser should give at least the string value that could not be parsed, but ideally also the context surrounding that string.

Philip-NLnetLabs avatar Apr 07 '25 14:04 Philip-NLnetLabs

My concern is, DNS operators are probably not writing zonefiles by hand; if they're generating it from an automated system, and the output is malformed (at least according to this zonefile parser implementation), they can't just modify the zonefile by hand in response to the error message (although perhaps that is helpful for fixing the immediate issue). They will need to update their automated zonefile generation to avoid making the same mistake in the future. From that perspective, I'd imagine that having a formal specification of the acceptable grammar is far more useful than a thorough description of a single parse error.

bal-e avatar Apr 07 '25 14:04 bal-e

My concern is, DNS operators are probably not writing zonefiles by hand; if they're generating it from an automated system, and the output is malformed (at least according to this zonefile parser implementation), they can't just modify the zonefile by hand in response to the error message (although perhaps that is helpful for fixing the immediate issue). They will need to update their automated zonefile generation to avoid making the same mistake in the future. From that perspective, I'd imagine that having a formal specification of the acceptable grammar is far more useful than a thorough description of a single parse error.

Indeed I don't expect operators to write zone files by hand. They will need to report the error to someone to get it solved in our software. We will then ask them "which record caused the error" and they will not know and may not wish to give out their raw data if they even have it.

ximon18 avatar Apr 07 '25 14:04 ximon18

We will then ask them "which record caused the error" and they will not know [...]

The parse error still reports the line number, so it's possible to find the failing record. Maybe this is harder if the zonefile is ephemeral, but I'd still expect it to be found on disk.

and may not wish to give out their raw data if they even have it.

Suppose an operator does encounter a zonefile parsing issue, and they are unable to share the original record (or the specific component of it that appeared to fail) because it is sensitive data. If we were to include that information in the error message, doesn't that just make the error message sensitive data too?

Still, I understand that having a bit more context in parse errors would be helpful. I'll update ParseError to return heap-allocated strings, and then use format!() to provide some additional information. I'll also fix the specific example that motivated this discussion, but there are other places which would benefit from this (e.g. reporting parse failures for IP addresses, which delegate to functions in std).

For providing additional context: calculating the character offset of a parse failure is surprisingly difficult because of parentheses. Parentheses are handled outside the Scanner type (in Entries) and are stripped by it; this simplifies the parsing logic in Scanner, but makes it impossible to determine character offsets in the original input. Changing this will not be pleasant, but it can be done if it would seriously benefit end users.

bal-e avatar Apr 07 '25 14:04 bal-e