froto Parser Improvements

The parser currently correctly turns proto2 and proto3 into an AST, but does no further processing or semantic validation.

Proposed additions:

[x] Process import statements #109
[ ] Normalize type names
[ ] Flatten type names
[ ] Validate type usage
[ ] Resolve constants
[ ] Capture comments

Process import statements Parser methods need to be enhanced with, e.g., a function parameter that can satisfy includes. For Parse.fromFile, this could be an include path or paths. How to provide this for strings and streams is an open question.

Normalize Type Names Convert various naming conventions into a single normalized form. Should support outputting in caller-selectable camelCase, PascalCase, snake_case, etc.

Flatten Type Names Messages can contain inner message and enum definitions. These are publically visible and can be referenced from other types. It will probably be convenient to flatten these symbols and move them from an inner scope to the top level scope, while renaming the type using the fully-qualified scope name; e.g., "Outer.Inner".

In addition, Record types cannot contain inner types. Since .NET can handle "+" in a symbol name, the names could be normalized from "Outer.Inner" to "Outer+Inner" either in this step, or when doing code-gen for Records.

Validate Type Usage Ensure every message used as a field type and enum used as a constant are defined somewhere. Must handle use before definition (so cannot be single pass). Must also handle qualification with the package name or unqualified using the current .proto file's package.

Resolve Constants An enum can be used as a constant in an Option definition. Consider either resolving these, or exposing the symbol table used during type usage validation so that client code can do the substitution.

Capture Comments Google's protobuf compiler can capture comments related to a symbol. This can be quite useful for generating documentation, such as in a type provider. It is also needed to create a full FileDescriptorSet (a binary protobuf representation of a set of .proto files).

Jul 02 '16 19:07 jhugard

Open question: does this belong in the Parser project? Or, in a separate Compiler project?

Jul 02 '16 19:07 jhugard

@ctaggart @bhandfast @agbogomolov @takemyoxygen Starting to take a look at the current Parser, and I'm afraid adding the above functionality will involve either breaking the existing API in several ways, or mean these steps will require extending the API:

Resolving import statements will require passing in a function which returns a string, stream, or file based on the import name.
- Parsing a file + imports also means returning a list of tuples containing the filename(s) and associated PProto ASTs; e.g., ( string * PProto ) list.
- It might be useful to return the symbol table, which is needed to resolve constants..
Capturing comments involves either adding terminals to the AST, or returning a separate table of comments (like Google's protoc.exe).
Normalizing type names requires passing in an option (or function) to select camelCase, PascalCase, or snake_case.
Flattening and validating won't change the API.

I'm not really in favor of adding to the public Parser.xxx functions, as the current API is fairly clean: there are only 3 functions - parse strings, streams, and files - each with a variation taking either an explicit parser or implicitly using the default parser (Parsers.pProto). This, IMHO, makes the API readily discoverable and easy to use. Requiring that the client call a "postProcess" function (or set of functions) does not make usage obvious.

I could add a Compile module (or some-such) to hold the new functionality.

Or, I could redefine the Parse module functions to handle all the above.

Any opinions?

Oct 11 '16 06:10 jhugard

@jhugard, I see the process as a sequence of two steps:

parsing + lexical analysis. I guess that's what ProtoFile.fromFile does now.
semantic analysis: types usage validation, symbol lookup tables etc.

And then, results of the 2nd step can be used by code generator or the type provider. So, it looks to me that second step should probably have a separate module.

The question is, is that necessary to expose results of the first step as public API? Maybe it would be better to expose single high-level function Compile.file: string -> CompiledProtobuf * LookupTable (maybe a few overloads to deal with streams etc) and make current ProtoFile model internal?

Oct 11 '16 18:10 takemyoxygen

Starting to iterate on this in branch parser-improvements-48.

Oct 13 '16 16:10 jhugard

So, are import statements currently processed, or was then never completed? cc @jhugard @ctaggart

Sep 28 '18 11:09 7sharp9

@7sharp9 I don't remember. Since you are asking, I'm guessing not. I'll help clean up these projects you are using soon.

Sep 28 '18 14:09 ctaggart

I think I had it working on this branch: https://github.com/ctaggart/froto/compare/parser-improvements-48

May have been holding the branch open to implement the other items on the list above, but you'll need to look over the code to be sure...

Sep 29 '18 15:09 jhugard

@7sharp9 I've reopened #65 and changed the title for review.

Sep 29 '18 17:09 ctaggart

froto froto copied to clipboard

Parser Improvements

froto
froto copied to clipboard