nom icon indicating copy to clipboard operation
nom copied to clipboard

Feature Request: re_take_until!

Open johncf opened this issue 6 years ago • 11 comments

I am trying to parse a not-so-structured document, and this would be a nice feature to have, so that I don't have to directly rely on regex package (and for better code readability).

If you are interested, I can do a PR since it seems fairly straightforward.

johncf avatar Mar 04 '18 11:03 johncf

hello, could you tell me more about what that combinator would do?

Geal avatar Mar 11 '18 12:03 Geal

Example: re_take_until!("hello|world") will take_until that regular expression is matched.

Applying the above on It's not the end of the world! should return (remaining input: world!, output: It's not the end of the )

johncf avatar Mar 11 '18 18:03 johncf

Honestly, a regex-based combinator would be absolutely amazing to have.

I don't doubt for one moment that nom can do everything regex can do, but there's just something nice about the succinctness of being able to write something like "[^@]+@[^@\.]+\.\w+" as a rudimentary email address parser that is appealing.

If you're then able to throw that into the greater nom ecosystem, that would be splendid.

ElectricCoffee avatar Jan 29 '19 21:01 ElectricCoffee

I don't doubt for one moment that nom can do everything regex can do

Without diving into the academic way of looking at this statement, I don't think there is a nom equivalent of this particular proposal. The take_* macros only do T -> bool or &[T], not other whole parsers.

cormacrelf avatar Mar 24 '19 12:03 cormacrelf

there's a lot of regex based combinators, you can find them by looking for the prefix re_ on https://docs.rs/nom/4.2.3/nom/

Geal avatar Mar 24 '19 12:03 Geal

Not sure if replying to me, but to clarify, this doesn't exist (yet), so I just wrote it myself:

// `take_till_match!(alt!(tag!("John") | tag!("Amanda")))`
// Running that on `"Hello, Amanda"` gives `Ok(("Amanda", "Hello, "))`
macro_rules! take_till_match(
  (__impl $i:expr, $submac2:ident!( $($args2:tt)* )) => (
    {
      use $crate::lib::std::result::Result::*;
      use $crate::lib::std::result::Result::*;
      use $crate::lib::std::option::Option::*;

      // TODO: replace nom with $crate
      use nom::{Err, Needed,need_more_err, ErrorKind};
      use nom::InputLength;
      use nom::FindSubstring;
      use nom::InputTake;
      use nom::Slice;

      let ret;
      let input = $i;
      let mut index = 0;

      loop {
        let slice = input.slice(index..); // XXX: this is bad with multi-byte unicode
        match $submac2!(slice, $($args2)*) {
          Ok((_i, _o)) => {
            ret = Ok(input.take_split(index));
            break;
          },
          Err(_e1)    => {
            if index >= input.len() {
                // XXX: this error is dramatically wrong
                ret = need_more_err(input, Needed::Size(0), ErrorKind::TakeUntil::<u32>);
                break;
            } else {
                index += 1;
            }
          },
        }
      }

      ret
    }
  );
  ($i:expr, $submac2:ident!( $($args2:tt)* )) => (
    take_till_match!(__impl $i, $submac2!($($args2)*));
  );
  ($i:expr, $g:expr) => (
    take_till_match!(__impl $i, call!($g));
  );
  ($i:expr, $submac2:ident!( $($args2:tt)* )) => (
    take_till_match!(__impl $i, $submac2!($($args2)*));
  );
  ($i:expr, $g: expr) => (
    take_till_match!(__impl $i, call!($g));
  );
);

cormacrelf avatar Mar 24 '19 13:03 cormacrelf

I took @cormacrelf 's macro and made some changes.

First, I added a trait to allow "safe-slicing" of strings.

Secondly, I modified the macro to make use of the trait.

lawliet89 avatar Apr 11 '19 13:04 lawliet89

@lawliet89 that's closer, but you could reuse existing APIs by making the trait give you an Iterator instead. Just abstract &str::char_indices().map(|(i, _)| i) and create an index++ version for byte slices. Here's what I ended up using in my code:

{
      let input = $i;
      for index in input.char_indices().map(|(i, _)| i) {
        let slice = input.slice(index..);
        match $submac2!(slice, $($args2)*) {
          Ok((_i, _o)) => {
            return Ok(input.take_split(index));
          },
          Err(_e1) => { },
        }
      }
      need_more_err(input, Needed::Size(0), ErrorKind::TakeUntil::<u32>)
}

cormacrelf avatar Apr 11 '19 14:04 cormacrelf

@cormacrelf Thanks for your suggestion! Made some changes and it looks much better.

lawliet89 avatar Apr 12 '19 01:04 lawliet89

Hey just stumbled upon this issue, I actually have a PR open for a take_until_parser_matches which it seems like we could then just put a regex parser as the parameter just like any other nom parser, solving the deficiency @cormacrelf pointed out in https://github.com/Geal/nom/issues/709#issuecomment-475954895 . From my first-pass reading of @cormacrelf 's code in https://github.com/Geal/nom/issues/709#issuecomment-475958529 mine functions in a very similar way except its a function instead of a macro and it looks like @cormacrelf 's supports streaming whereas mine is does not.

Unfortunately it seems Geal is very busy right now so I have no idea when it'll get eyes on it again.

PR: https://github.com/Geal/nom/pull/469

tomalexander avatar Dec 19 '20 19:12 tomalexander

I'd propose closing this as regex functions are no longer present in this crate. I've opened up a new issue on nom-regex, https://github.com/rust-bakery/nom-regex/issues/3, to continue the request.

I don't think take_until_parser_matches is a good solution here, as iterating a regex-containing parser multiple times essentially redoes the work of a Regex "find" function, and thus eliminates a big performance benefit of using regex for this.

daboross avatar Mar 08 '23 01:03 daboross