chumsky
chumsky copied to clipboard
`binary` mod to match `text`
Proposal
A binary
module to match text
, with similar helpers but for non-textual input.
TL;DR Explanation
The binary module would have functions for parsers like int
and string
, which would interpret the relevant types from u8
streams.
Full Explanation
The proposed module would contain helpers for parsing binary files, in the same vein as there already exist helpers for parsing text.
Some helpers that would be nice to have (exact signatures up for bikeshedding, main point is what they'd allow):
-
int<I>(endian: Endian)
for readingsize_of::<I>()
bytes as an integer of a particular type and endian -
float<F>(endian: Endian)
same for floats. Maybe combine these -
string(ty: StringTy)
for reading strings. The type would be like, 'null terminated' or 'length prefixed' - Possibly other types as well, such as
Vec
, Arrays, or similar.
I think chumsky has good potential as a tool for parsing non-textual files as well as text-based ones, but these kind of primitive operations are currently missing. With just a handful of these basic tools, parsing binary files could be just as painless as any other type.
While I'm not outright opposed, I'm a little reluctant to implement these for a few reasons:
-
Chumsky is not intended for high-performance parcing, it's designed for high-quality error messages. This makes it less useful in the domain of binary parsing given that parsers like
nom
exist. For most cases, I'm sure thatnom
's binary parsing is likely several times faster than Chumsky's, at least. -
The potential scope of this seems unbounded and is probably better covered by dedicated binary parsing crates like
bincode
,bson
, etc.
I think it's fair to say that Chumsky is pretty specifically oriented towards the parsing of text for the purpose of generating human-readable error messages, so I'm worried that this falls outside the scope of the library and might become quite difficult to maintain.
That's fair. I was honestly thinking that performance still wasn't the primary goal - I tend to write binary parsers for reverse engineering purposes, a situation where chumsky
's focus on errors and ease of iteration over speed seem perfect. Often the trickiest part of parsing an unknown format is figuring out how your parser is wrong, good errors can make that significantly easier.
Scope could be an issue, though I think it would be fine to pick a set of 'basic' operations, and anything that starts asking thorny implementation questions could be pushed to an external extension. All of it could be an extension, but I feel just accepting a limited set of basic operations is a sufficiently minimal scope for sufficiently high usefulness.
I understand if in the end you don't want to do this, but I think it would at least be useful for some cases.
This could be a useful thing to have in an extension crate, particularly when I add support for zero-copy parsing (see #9 ).