chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Implement IterParser::chain

Open al8n opened this issue 4 months ago • 4 comments

Hi, I met a situation where I need to parse the first element by one parser, and the remaining by another parser. I tried to implement a custom parser, but I found that some traits are private and I cannot do it. Any ideas?

/// Parse the first element with `F`, then zero-or-more remaining with `R`.
pub struct SpecialFirstRepeated<F, R, O, I, E> {
  first_parser: F,
  remaining_parser: R,
  #[cfg(debug_assertions)]
  location: Location<'static>,
  _m: PhantomData<(O, I, E)>,
}

impl<F: Copy, R: Copy, O, I, E> Copy for SpecialFirstRepeated<F, R, O, I, E> {}
impl<F: Clone, R: Clone, O, I, E> Clone for SpecialFirstRepeated<F, R, O, I, E> {
  fn clone(&self) -> Self {
    Self {
      first_parser: self.first_parser.clone(),
      remaining_parser: self.remaining_parser.clone(),
      #[cfg(debug_assertions)]
      location: self.location,
      _m: PhantomData,
    }
  }
}

impl<F, R, O, I, E> SpecialFirstRepeated<F, R, O, I, E> {
  pub fn new(first_parser: F, remaining_parser: R) -> Self {
    Self {
      first_parser,
      remaining_parser,
      #[cfg(debug_assertions)]
      location: Location::caller(),
      _m: PhantomData,
    }
  }
}

impl<'src, I, E, F, R, O> Parser<'src, I, (), E> for SpecialFirstRepeated<F, R, O, I, E>
where
  I: Input<'src>,
  E: ParserExtra<'src, I>,
  F: Parser<'src, I, O, E>,
  R: Parser<'src, I, O, E>,
{
  #[inline(always)]
  fn go<M: Mode>(&self, inp: &mut InputRef<'src, '_, I, E>) -> PResult<M, ()> {
    // Drive via IterParser to keep progress checks similar to Repeated
    let mut count = self.make_iter::<Check>(inp)?;
    loop {
      #[cfg(debug_assertions)]
      let before = inp.cursor();

      match self.next::<Check>(inp, &mut count) {
        Ok(Some(_)) => {}
        Ok(None)    => break Ok(M::bind(|| ())),
        Err(())     => break Err(()),
      }

      #[cfg(debug_assertions)]
      debug_assert!(
        before != inp.cursor(),
        "found SpecialFirstRepeated making no progress at {}",
        self.location,
      );
    }
  }

  #[inline(always)]
    fn go_emit(&self, inp: &mut InputRef<'src, '_, I, E>) -> PResult<Emit, ()> {
      Parser::<I, (), E>::go::<Emit>(self, inp)
    }
    #[inline(always)]
    fn go_check(&self, inp: &mut InputRef<'src, '_, I, E>) -> PResult<Check, ()> {
      Parser::<I, (), E>::go::<Check>(self, inp)
    }
}

impl<'src, I, E, F, R, O> IterParser<'src, I, O, E> for SpecialFirstRepeated<F, R, O, I, E>
where
  I: Input<'src>,
  E: ParserExtra<'src, I>,
  F: Parser<'src, I, O, E>,
  R: Parser<'src, I, O, E>,
{
  /// `count == 0` => parse with `first_parser`, otherwise with `remaining_parser`.
  type IterState<M: Mode> = usize;

  #[inline(always)]
  fn make_iter<M: Mode>(
    &self,
    _inp: &mut InputRef<'src, '_, I, E>,
  ) -> PResult<Emit, Self::IterState<M>> {
    Ok(0)
  }

  #[inline(always)]
  fn next<M: Mode>(
    &self,
    inp: &mut InputRef<'src, '_, I, E>,
    count: &mut Self::IterState<M>,
  ) -> IPResult<M, O> {
    let before = inp.save();
    let res = if *count == 0 {
      self.first_parser.go::<M>(inp)
    } else {
      self.remaining_parser.go::<M>(inp)
    };

    match res {
      Ok(item) => {
        *count += 1;
        Ok(Some(item))
      }
      Err(()) => {
        inp.rewind(before);
        if *count == 0 {
          // first element is mandatory
          Err(())
        } else {
          Ok(None)
        }
      }
    }
  }
}

#[track_caller]
pub fn first_then_many<'src, I, E, F, R, O>(
  first: F,
  rest: R,
) -> SpecialFirstRepeated<F, R, O, I, E>
where
  I: Input<'src>,
  E: ParserExtra<'src, I>,
  F: Parser<'src, I, O, E>,
  R: Parser<'src, I, O, E>,
{
  SpecialFirstRepeated::new(first, rest)
}

al8n avatar Aug 19 '25 19:08 al8n

Out of curiosity, how does this differ from something like:


pub fn first_then_many<'src, I, E, F, R, O>(
  first: F,
  rest: R,
) -> impl Parser<'src, I, (), E>
where
  I: Input<'src>,
  E: ParserExtra<'src, I>,
  F: Parser<'src, I, O, E>,
  R: Parser<'src, I, O, E>,
{
  first.then(rest.repeated()).ignored()
}

I suppose it doesn’t implement IterParser, but the only requirement you listed was the ability to parse using one parser followed by multiple of another.

Zij-IT avatar Aug 19 '25 21:08 Zij-IT

Out of curiosity, how does this differ from something like:

pub fn first_then_many<'src, I, E, F, R, O>( first: F, rest: R, ) -> impl Parser<'src, I, (), E> where I: Input<'src>, E: ParserExtra<'src, I>, F: Parser<'src, I, O, E>, R: Parser<'src, I, O, E>, { first.then(rest.repeated()).ignored() } I suppose it doesn’t implement IterParser, but the only requirement you listed was the ability to parse using one parser followed by multiple of another.

I need to collect a Vec<O>.


#[track_caller]
pub fn first_then_many<'src, I, E, F, R, O>(
  first: F,
  rest: R,
) -> SpecialFirstRepeated<F, R, O, I, E>
where
  I: Input<'src>,
  E: ParserExtra<'src, I>,
  F: Parser<'src, I, O, E>,
  R: Parser<'src, I, O, E>,
{
  SpecialFirstRepeated::new(first, rest)
}

fn foo<C>() -> impl Parser<'src, I, C, E>
where
  I: Input<'src>,
  E: ParserExtra<'src, I>,
  F: Parser<'src, I, O, E>,
  R: Parser<'src, I, O, E>,
  C: Container<O>,
{
   first_then_many(first, rest).collect::<C>()
}

The use case is that, e.g.

Or =
  '=' '|'? Ident
  | Or '|' Ident

For the first element, a | is optional, but for the remaining, a | is required before the ident.

al8n avatar Aug 20 '25 06:08 al8n

If we have Chain<A, B>, it seems to be able to express this more ergonomically, just like Rust's iter combinator.

first_parser().repeated().exactly(1).chain(remaining_parser().repeated()).collect()

al8n avatar Aug 20 '25 10:08 al8n

Yes, we could do with a IterParser::chain function that works as you describe. I'm going to rename the issue.

zesterer avatar Aug 24 '25 00:08 zesterer