protobuf-parser icon indicating copy to clipboard operation
protobuf-parser copied to clipboard

Rewrite parser without nom

Open stepancheg opened this issue 7 years ago • 6 comments

I was struggling with nom lack of good diagnostic messages, and I decided that it's easier to rewrite parser without nom.

The new parser in particular properly reports error location (line and column) which was a big pain of old parser (fixes #13).

New parser correctly accepts syntax which is specified in syntax = ... header.

New parser successfully parses (although doesn't check for correctness) all protobuf .proto files:

cargo run --example test_against_protobuf_protos ~/devel/protobuf/

stepancheg avatar Apr 26 '18 18:04 stepancheg

Wow. This is a lot of work!

I agree that not using nom might be better (performance is definitely not an issue) for support purpose. I do have several comments though, some esthetics (e.g. function names not consistent ...), some more practical (I think parser should implement Iterator<Item=char> which is a line/column augmented version of std::str::char_indices).

I am wondering if making a PR on your PR is better or not.

Alternatively, I think, if you want, you should be owner of that repo because well, you've put lot of work in it and have probably rewritten most of it!

Thanks again for all your work.

tafia avatar Apr 30 '18 08:04 tafia

I tried to make it more readable and decided to try to rewrite it as lexer+parser. It will take a little time.

Alternatively, I think, if you want, you should be owner of that repo because well, you've put lot of work in it and have probably rewritten most of it!

If you intend to use it in quick-protobuf codegen, I'd appreciate if you make me an owner or give me push permissions so I could push simple changes without waiting for merge. However, I would still appreciate code review for larger changes like this one.

However, if you are not going to use this parser in quick-protobuf, I should simply merge it into the rust-protobuf project.

stepancheg avatar May 01 '18 01:05 stepancheg

Updated with lexer+parser.

stepancheg avatar May 02 '18 04:05 stepancheg

Hello! I guess it's a bit late to try and convince you to keep the nom parser, but did you see https://github.com/fflorent/nom_locate ? It solves the issue of getting line and column info for any part of the input that's returned, and for errors too.

Geal avatar May 04 '18 21:05 Geal

@Geal I've seen nom-locate, but I couldn't easily understand how to obtain line and column number of error from it.

But the biggest issue is that I found nom parser (with programmable macros) to be too hard to use: documentation is not perfect, and reading the sources is hard (you cannot cmd-click on tag! to understand what it does). I think a hand-written parser is easier to work with.

stepancheg avatar May 06 '18 08:05 stepancheg

I've copied contents of the PR into protobuf-codegen-pure crate to unblock development of new features. So protobuf-codegen-pure doesn't use protobuf-parser crate now. Maybe I'll switch to using protobuf-parser back later.

stepancheg avatar May 06 '18 08:05 stepancheg