froto
froto copied to clipboard
Parser Improvements
The parser currently correctly turns proto2
and proto3
into an AST, but does no further processing or semantic validation.
Proposed additions:
- [x] Process
import
statements #109 - [ ] Normalize type names
- [ ] Flatten type names
- [ ] Validate type usage
- [ ] Resolve constants
- [ ] Capture comments
Process import
statements
Parser methods need to be enhanced with, e.g., a function parameter that can satisfy includes. For Parse.fromFile
, this could be an include path or paths. How to provide this for strings and streams is an open question.
Normalize Type Names Convert various naming conventions into a single normalized form. Should support outputting in caller-selectable camelCase, PascalCase, snake_case, etc.
Flatten Type Names Messages can contain inner message and enum definitions. These are publically visible and can be referenced from other types. It will probably be convenient to flatten these symbols and move them from an inner scope to the top level scope, while renaming the type using the fully-qualified scope name; e.g., "Outer.Inner".
In addition, Record types cannot contain inner types. Since .NET can handle "+" in a symbol name, the names could be normalized from "Outer.Inner" to "Outer+Inner" either in this step, or when doing code-gen for Records.
Validate Type Usage
Ensure every message
used as a field type and enum
used as a constant are defined somewhere. Must handle use before definition (so cannot be single pass). Must also handle qualification with the package
name or unqualified using the current .proto file's package.
Resolve Constants
An enum
can be used as a constant in an Option definition. Consider either resolving these, or exposing the symbol table used during type usage validation so that client code can do the substitution.
Capture Comments Google's protobuf compiler can capture comments related to a symbol. This can be quite useful for generating documentation, such as in a type provider. It is also needed to create a full FileDescriptorSet (a binary protobuf representation of a set of .proto files).
Open question: does this belong in the Parser project? Or, in a separate Compiler project?
@ctaggart @bhandfast @agbogomolov @takemyoxygen Starting to take a look at the current Parser, and I'm afraid adding the above functionality will involve either breaking the existing API in several ways, or mean these steps will require extending the API:
- Resolving
import
statements will require passing in a function which returns a string, stream, or file based on the import name.- Parsing a file + imports also means returning a list of tuples containing the filename(s) and associated PProto ASTs; e.g.,
( string * PProto ) list
. - It might be useful to return the symbol table, which is needed to resolve constants..
- Parsing a file + imports also means returning a list of tuples containing the filename(s) and associated PProto ASTs; e.g.,
- Capturing comments involves either adding terminals to the AST, or returning a separate table of comments (like Google's protoc.exe).
- Normalizing type names requires passing in an option (or function) to select camelCase, PascalCase, or snake_case.
- Flattening and validating won't change the API.
I'm not really in favor of adding to the public Parser.xxx functions, as the current API is fairly clean: there are only 3 functions - parse strings, streams, and files - each with a variation taking either an explicit parser or implicitly using the default parser (Parsers.pProto). This, IMHO, makes the API readily discoverable and easy to use. Requiring that the client call a "postProcess" function (or set of functions) does not make usage obvious.
I could add a Compile
module (or some-such) to hold the new functionality.
Or, I could redefine the Parse
module functions to handle all the above.
Any opinions?
@jhugard, I see the process as a sequence of two steps:
- parsing + lexical analysis. I guess that's what
ProtoFile.fromFile
does now. - semantic analysis: types usage validation, symbol lookup tables etc.
And then, results of the 2nd step can be used by code generator or the type provider. So, it looks to me that second step should probably have a separate module.
The question is, is that necessary to expose results of the first step as public API? Maybe it would be better to expose single high-level function Compile.file: string -> CompiledProtobuf * LookupTable
(maybe a few overloads to deal with streams etc) and make current ProtoFile
model internal?
Starting to iterate on this in branch parser-improvements-48
.
So, are import statements currently processed, or was then never completed? cc @jhugard @ctaggart
@7sharp9 I don't remember. Since you are asking, I'm guessing not. I'll help clean up these projects you are using soon.
I think I had it working on this branch: https://github.com/ctaggart/froto/compare/parser-improvements-48
May have been holding the branch open to implement the other items on the list above, but you'll need to look over the code to be sure...
@7sharp9 I've reopened #65 and changed the title for review.