nom_locate icon indicating copy to clipboard operation
nom_locate copied to clipboard

Feature for easier test bootstrapping

Open pitaj opened this issue 3 years ago • 12 comments

While implementing a recent parser, I came upon an annoyance: manually slicing spans when writing test cases was very tedious.

So I modified nom_locate with a feature that allows for much easier testing by ignoring location information during test. It works by utilizing the fact that line should never be zero in normal operation:

    /// Create a span for use in testing
    /// 
    /// Spans created this way will be tested for equality purely on the fragment,
    /// ignoring the `offset` and `line`.
    #[cfg(feature = "easy-test")]
    pub fn test_new_extra(program: T, extra: X) -> LocatedSpan<T, X> {
        LocatedSpan {
            offset: 0,
            line: 0,
            fragment: program,
            extra,
        }
    }

#[cfg(feature = "easy-test")]
impl<T: AsBytes + PartialEq, X> PartialEq for LocatedSpan<T, X> {
    fn eq(&self, other: &Self) -> bool {
        if self.line == 0 || other.line == 0 {
            self.fragment == other.fragment
        } else {
            self.line == other.line && self.offset == other.offset && self.fragment == other.fragment
        }
    }
}

This allows me to define a function to easily use in tests:

function sp(s: &str) -> Span {
    Span::test_new_extra(s, Default::default())
}

And use it in a test like so:

assert_eq!(
    path(sp("a.b.c, what")),
    Ok((
        sp(", what"),
        Expression::Path {
            span: sp("a.b.c"),
            path: vec![
                PathPart::Part(sp("a")),
                PathPart::Part(sp("b")),
                PathPart::Part(sp("c"))
            ]
        }
    ))
);

I've never run into a case where span location was an issue, since it's always automatically derived correctly. So testing it in tests is a lot of tedium for very little if any benefit.

Let me know if there's a way of doing this I'm missing.

pitaj avatar Nov 22 '20 18:11 pitaj

We could implement PartialEq<T> for LocatedSpan<T>. It should work if your Expression and PathPart are generic in the string type, right?

progval avatar Nov 22 '20 18:11 progval

Good idea! Those aren't generic in my program, but let me experiment and see if that'll work.

pitaj avatar Nov 22 '20 19:11 pitaj

Okay I got it working, but it requires a custom impl PartialEq for each type you wish to do this on, because PartialEq is not generic over the right-hand side. It can get kinda annoying with large enums

impl<S: PartialEq<O>, O> PartialEq<Expression<O>> for Expression<S> {
    fn eq(&self, other: &Expression<O>) -> bool {
        match (self, other) {
            (Expression::StringLiteral(span), Expression::StringLiteral(other_span)) => {
                span == other_span
            }
            (Expression::Path { span, path }, Expression::Path { span: other_span, path: other_path }) => {
                span == other_span && path == other_path
            }
            (Expression::Negative { span, expr }, Expression::Negative { span: other_span, expr: other_expr }) => {
                span == other_span && **expr == **other_expr
            }
            (Expression::Helper { span, name, args }, Expression::Helper { span: other_span, name: other_name, args: other_args }) => {
                span == other_span && name == other_name && args == other_args
            }
            (Expression::LegacyHelper { span, name, args }, Expression::LegacyHelper { span: other_span, name: other_name, args: other_args }) => {
                span == other_span && name == other_name && args == other_args
            }
            _ => false
        }
    }
}

I also haven't been able to find a crate that provides a derive macro or anything to provide a better experience here, though I'm sure it's possible to create one.

I still think this is a better approach than my suggestion (it's really nice not needing to wrap every string literal in sp()).

pitaj avatar Nov 22 '20 21:11 pitaj

This may be a dealbreaker though:

Because Result, Option, tuples, and other built-in types don't implement PartialEq like this, tests have to use .unwrap().1 all over the place because the following can't work:

        assert_eq!(
            path(sp("@value.c")),
            Ok((
                ".c",
                Expression::Path {
                    span: "@value",
                    path: vec![PathPart::Part("@value")]
                }
            ))
        );

Which means you either have to split checking rest and the parser output into two assertions, or just ignore one side or the other.

pitaj avatar Nov 22 '20 22:11 pitaj

Hmm yeah.

Did you consider writing a function that converts YourTree<LocatedSpan<T>> into YourTree<T>? If your parse tree type isn't too large, it shouldn't be too much trouble.

Or alternatively, don't use LocatedSpan at all in most of your tests, by having your parser operate without nom_locate (with a few tests to make sure it still works with nom_locate, ofc)

progval avatar Nov 22 '20 22:11 progval

How would you go about writing a generic parser like that without specifying tons of trait bounds all over the place? I tried defining one trait which has all of the rest in where Self: ... but I can't figure out how to get that to work for some nom requirements like <Self as nom::InputTakeAtPosition>::Item: nom::AsChar. I also can't figure out a way to allow str methods like .ends_with because span impls Deref<Target = &str> while &str impls Deref<Target = str>

pitaj avatar Nov 22 '20 23:11 pitaj

Hmm yeah, you're right, I didn't consider you could need those.

progval avatar Nov 23 '20 09:11 progval

Did you consider writing a function that converts YourTree<LocatedSpan<T>> into YourTree<T>? If your parse tree type isn't too large, it shouldn't be too much trouble.

I ended up just doing this, and making my own assert_eq! that calls it.

pitaj avatar Nov 23 '20 17:11 pitaj

Cool! Can I consider this issue solved?

progval avatar Nov 23 '20 17:11 progval

Well I was able to implement a solution that worked for me, but I think this is still a pain point and warrants further study.

Also, if the default behavior of derive(PartialEq) ever changes to become more generic, it could enable your prior suggested solution.

I'd suggest keeping this issue open as a reminder.

pitaj avatar Nov 23 '20 19:11 pitaj

I also had to write tests while trying to add nom_locate to an existing parser. I ended up with two traits to add (resp. remove) span information as needed for assertions: https://github.com/vtavernier/glsl/blob/aec07998d7882d92d8eb2d835379321056782aef/glsl/src/parse_tests.rs#L5-L107

/// Trait to add span information. ParseInput is a type alias for nom_locate::Span
trait Span<'c, 'd, 'e>: Sized {
  fn span(self) -> crate::parsers::ParseInput<'c, 'd, 'e>;
}

impl<'c> Span<'c, 'static, 'static> for &'c str {
  fn span(self) -> crate::parsers::ParseInput<'c, 'static, 'static> {
    crate::parsers::ParseInput::new_extra(self, None)
  }
}

// impls for other AST nodes

/// Trait to remove span information, to be implemented on syntax nodes
trait Unspan: Sized {
  type Unspanned: Sized;

  fn unspan(self) -> Self::Unspanned;
}

/// Simplest impl: nom_locate::Span back to str
impl<'c> Unspan for ParseInput<'c, '_, '_> {
  type Unspanned = &'c str;

  fn unspan(self) -> Self::Unspanned {
    self.fragment()
  }
}

// impls for Result, Option, etc.

/// Usage example:
assert_eq!(parser("test".span()).unspan(), Ok("test"))

The rest of the test harness relies on the fact that AST nodes are wrapped in a Node<T> which implements a trait to only compare node contents (regardless of span information): https://github.com/vtavernier/glsl/blob/aec07998d7882d92d8eb2d835379321056782aef/glsl/src/syntax/node.rs#L146

Regardless of the approach chosen I think we still need to have a way to compare span information for tests when needed, so having PartialEq implemented for LocatedSpan "ignoring" the span information wouldn't be the best bet IMO.

alixinne avatar Dec 01 '20 11:12 alixinne

I came across this thread after I solved this (in yet a different way):

    struct EasySpan<T, X> {
        input: LocatedSpan<T, X>,
    }

    impl<T, X> EasySpan<T, X>
    where
        LocatedSpan<T, X>: Slice<RangeFrom<usize>> + Slice<RangeTo<usize>>,
    {
        fn next(&mut self, count: usize) -> LocatedSpan<T, X> {
            let (after, before) = self.input.take_split(count);
            self.input = after;
            before
        }
    }

    #[test]
    fn test_tokenise() {
        let input = Span::new("A  1234");

        let mut span = EasySpan { input };
        let want = Tokens::new(&[
            Token::new(TokenType::Word, span.next(1)),
            Token::new(TokenType::Whitespace, span.next(2)),
            Token::new(TokenType::Number, span.next(4)),
        ]);
        let got = tokenise(input).expect("token parsing failed");
        assert_eq!(got, want);
    }


bramp avatar Jun 24 '21 03:06 bramp