pest Allow pest to match against byte literals

For example, if we would like to match against an occurrence of one or more actual bytes (like 7c or FFFF) we should be able to. This would take a bit of design work, however, since we make assumptions in many places in the code that we'll be dealing with UTF-8 strings.

Request from @Restioson

May 28 '18 18:05 jstnlef

The redesign should tackle the issue of capturing strings from the input with the generated Spans. Currently, these guarantee UTF-8 cheap captures.

May 28 '18 21:05 dragostis

Yes hello, thanks. For more info: my usecase is matching actual bytes (not literals) for capturing values in AML.

May 30 '18 13:05 Restioson

@Restioson, do you need this feature soon? Going through the design work to add this feature will probably take some time. Probably some post 2.0 launch.

The big issue here is that Position and Span only work on UTF-8 borders. The types guarantee this. In order for byte parsing to work, one needs to either use completely different types, (imagine 2.x release), or rebase the current types to handle both cases somehow. (3.0+)

May 30 '18 14:05 dragostis

TBH we're probably going to do it with a handwritten parser @IsaacWoods wrote a while back (but nevertheless I really like this library and would like to keep helping). So, no timeframe really.

May 30 '18 15:05 Restioson

bump, any progress on this? i might be needing this soon (am parsing IMAP with pest) so i'll probably implement a proof-of-concept in a bit

Mar 09 '21 01:03 mzhang28

Can you put it on the agenda?

Mar 14 '24 02:03 12089897411

@12089897411 I posted it as one of ideas here: https://github.com/pest-parser/pest/discussions/885#discussioncomment-6449851 feel free to upvote or comment on it. It won't likely be an initial priority in pest3, but once the pest3 codebase settles, it'll be more open to experiment with changes in that regard

Mar 14 '24 08:03 tomtau