cpp-peglib
cpp-peglib copied to clipboard
Error reporting questions
This is not a defect report, just a request for guidance.
I'm looking to add nested comments, and I'd like to be able to take the input:
/* line 1:1 is the first comment open
/* line 2:2 is the second
/* line 3:3 and so on
/* line 4:4
/* line 5:5
*/ // closes line 5:5 (this line has a newline, so the EOI is at 7:1
and report
error: 7:1: unexpected end of input while 4 block comments remain open:
note: 4:4: comment opened here
note: 3:3: comment opened here
note: 2:2: comment opened here
note: 1:1: comment opened here
or at least
error: 7:1: unterminated comment
error: 4:4: comment opened here
the line-split is important so that I can follow the FLClm (file-line-column: level: message) convention that allows IDEs to take users to source
/users/badprogrammer/terribleness/hideous.cpp:7:1: error: unexpected end of input...
/users/badprogrammer/terribleness/hideous.cpp:4:4: note: comment opened here
Sample:
program <- (~NL / expr)* ~EOI
~BLOCK_COMMENT <- '/*' ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] )* ('*/'^unterminated_comment)
~LINE_COMMENT <- '//' [^\n]*
~NOISE <- [ \f\r\t] / BLOCK_COMMENT
EOI <- !.
NL <- NOISE* LINE_COMMENT? '\n'
# error recovery
unterminated_comment <- EOI { message "unterminated block comment" }
expr <- 'hello'
I looked at using parser["BLOCK_COMMENT"].enter/.leave but I didn't see a way to capture the positions; I looked at %recovery/label{message} but it only appears to be able to tell me where the unexpected symbol is.
This made me think that I might need to use a negative look ahead?
# NLAv1
~BLOCK_COMMENT <- '/*' (!EOI^unterminated_comment ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] ))* '*/'
still gives me the EOI
# NLAv2
~BLOCK_COMMENT <- '/*' (!EOI^unterminated_comment / ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] )* '*/')^unterminated_comment
an improvement: tells me the outer-most location of the unterminated comment, but I really don't like the look of the (!EOI / ...)*. Also, would this be an appropriate place for a cut?
# NLAv3
~BLOCK_COMMENT <- '/*' ↑ (!EOI^unterminated_comment / ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] )* '*/')^unterminated_comment
So perhaps:
# NLAv4
~BLOCK_COMMENT <- '/*' TERMINATED_COMMENT^unterminated_comment
~TERMINATED_COMMENT <- ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] )* '*/'
this gives me the outermost unterminated comment
// this is a comment
/* this is a comment /* // and this doesn't stop it <-- 2:6 is here
so this is also a comment // /* as is this // */
we're still commenting /* <- this is also unterminated
hello
2:6 unterminated block comment
Being able to someone indicate the depth would be really helpful -- nested block comments are far more complex than I imagined because - for instance - strings and escapes aren't taken into account, line comments don't factor, or if they did...
/* printf("unterminated comment: did you forget the '*/'?"); */ # oops
vs
// printf("unterminated comment: did you forget the '/*?'"); */ # ok
/*
// I commented this line out */
/*
// I'm commenting this comment out /* the cake is a lie */
*/
printf("// disallow /* in strings"); // */ not required
I'm going to go stick my head in a buck of cold water now. If you close this ticket and pretend it never happened, I'll totally understand :)