parlot icon indicating copy to clipboard operation
parlot copied to clipboard

Sample for parsing comments

Open jorgeleo opened this issue 1 year ago • 1 comments

hi,

I was looking for example of parsing comments.

Single line // And this will be a comment

And multiline

/*
 Comment
*/*
   Still a comment
 */
Closing of the second comment, but still in the first comment
*/

jorgeleo avatar May 08 '24 00:05 jorgeleo

Might require some "counting" and a custom parser context to maintain this counter.

Example in Fluid: https://github.com/sebastienros/fluid/blob/main/Fluid/FluidParser.cs#L226-L247

This is the context class: https://github.com/sebastienros/fluid/blob/main/Fluid/Parser/FluidParseContext.cs

The idea is that when you find /* you increment a counter and decrement it when you find */. But the result of the parser can differ if the counter is back to zero. And if you reach EOF while the counter is positive then everything is a comment, or you can return an error.

sebastienros avatar May 08 '24 00:05 sebastienros

@jorgeleo I implemented simple and configurable comment parser:

            WhiteSpaceWithCommentsParser = Literals.ExtendedWhiteSpace(
                s => s.SkipWhiteSpaceOrNewLine(),
                s => s.SkipSingleLineComment("--"),
                s => s.SkipMultiLineComment("/*", "*/", true)
            ).Then(static x => x.ToString());

            WhiteSpaceWithCommentsParserNoNesting = Literals.ExtendedWhiteSpace(
                s => s.SkipWhiteSpaceOrNewLine(),
                s => s.SkipSingleLineComment("--"),
                s => s.SkipMultiLineComment("/*", "*/", false)
            ).Then(static x => x.ToString());

https://github.com/lampersky/UsefulParlotParsers/blob/main/src/Lampersky.UsefulParlotParsers.Tests/ExtendedWhiteSpaceParserTests.cs

/cc @sebastienros

lampersky avatar Oct 28 '25 18:10 lampersky

I added this so we can set any parser to handle WS/comments.

Next is to have these parser @lampersky available to pass to this new extension. I was thinking of making independent parsers that are optimized for parsing:

  • using a vector search on \n|EOF, reusing NoneOf('\n'), but maybe a dedicated on that takes a whole string instead of chars
  • anything before this text (*/), again using the same new parser

Then with the new parser we could add custom helpers for building standard comments support in the extensions

sebastienros avatar Oct 28 '25 20:10 sebastienros

@sebastienros WithWhiteSpaceParser works great! If you have time, please have a look if approach with comment parsing logic inside the scanner (and parser) is ok, or if you are thinking about something else.

Image

Here all possible comments known from sql language are tested.

When will Parlot 1.5.2 with those latest extensions be shipped

lampersky avatar Oct 29 '25 21:10 lampersky

Are there any languages that support nested comments? Tried c#, js and sql with no luck.

sebastienros avatar Oct 30 '25 15:10 sebastienros

this is a screenshot from MS SQL Server Management Studio:

Image

as you can see, you can nest a multiline comment inside another one, in c# and JS it won't work

lampersky avatar Oct 30 '25 16:10 lampersky

Copilot tells me otherwise. Maybe it's just a feature of SQL Manager Studio

In SQL, multi-line comments are typically enclosed between /* and */. However, nested multi-line comments (placing one multi-line comment inside another) are not supported in most SQL implementations, including SQL Server.

sebastienros avatar Oct 30 '25 16:10 sebastienros

The doc says it is supported https://learn.microsoft.com/en-us/sql/t-sql/language-elements/slash-star-comment-transact-sql?view=sql-server-ver16#remarks

sebastienros avatar Oct 30 '25 16:10 sebastienros