holyc-lang icon indicating copy to clipboard operation
holyc-lang copied to clipboard

[BUG] - segfault if the parser cannot find end marker of a comment

Open rilysh opened this issue 8 months ago • 3 comments

Describe the bug Hello, In the following example, compiler parser segfaults if tokenRingBufferPeek() returns NULL.

U0 Main() {
   /*
}

A debug build (address sanitizer enabled), it points to cctrlTokenPeek(), which calls tokenRingBufferPeek() and it returns a NULL pointer. Functions that call ccttrlTokenPeek(), doesn't handle it's return value, and tries to dereference the NULL pointer.

I don't know why ring_buffer size is 0 (if /* doesn't end with */), I assume ring buffer is just keep popping and it doesn't see that we didn't end the comment line.

To Reproduce Steps to reproduce the behavior:

  1. Compile the following example program.

Example Program

U0 Main() {
   /*
}

Expected behavior No error at all.

Compiler version hcc beta-v0.0.9 binary: /usr/local/hcc commit-hash: fd2d4105fae24dc24137c70b293f0b4e32675a83 arch: Linux x86_64 build: Debug

Desktop (please complete the following information):

  • OS: N/A

rilysh avatar Mar 01 '25 18:03 rilysh

Thanks for this,

If you fancy tackling this I'd suggest modifying this; https://github.com/Jamesbarford/holyc-lang/blob/fd2d4105fae24dc24137c70b293f0b4e32675a83/src/lexer.c#L986-L991

To be somethinglike:

if (*l->ptr == '/' || *l->ptr == '*') {
    start = l->ptr-1;
    int original_lineno = l->lineno;
    lexSkipCodeComment(l);
    /* Check for unterminated comments */
    if (*l->ptr == '\0' || *(l->ptr + 1) == '\0') {
        loggerPanic("Line: %d Unterminated comment\n", original_lineno);
    }
    int lineno = l->lineno;

That said; if you can think of a better solution I'm all for it!

If you wanted to make a nicer error there is a function called lexerReportLine(...) at the bottom of the file which will, presuming that it works, give you the line of the current file being lexed. Meaning we could show the problem as well as the line number.

Jamesbarford avatar Mar 01 '25 20:03 Jamesbarford

Hi,

if (l->ptr == '/' || l->ptr == '') { start = l->ptr-1; int original_lineno = l->lineno; lexSkipCodeComment(l); / Check for unterminated comments */ if (*l->ptr == '\0' || *(l->ptr + 1) == '\0') { loggerPanic("Line: %d Unterminated comment\n", original_lineno); } int lineno = l->lineno;

Above one is good, but it won't able to handle something like /* */ */ And even if we do this

if (*(l->ptr + 1) == '*' && *(l->ptr + 2) == '/') {
	loggerPanic("Line: %d comment never started\n", l->lineno);
}

This won't handle /* */ */ but should /* */ */, (spaces in between two ending tokens needs to be removed first).

rilysh avatar Mar 02 '25 06:03 rilysh

Yes, you're right;

U0 Main()
{
  /* ... */ */
}

Segfaults. It seems there is probably something wrong with the parser and the expression parsing. I imagine it sees *, possibly thinks it's a dereference and then / a division which isn't elegently handled causing a segfault.

Though this problem is arguably separate from a non-terminated comment as the following has the same behviour, resulting in a segfault:

U0 Main()
{
  /* ... */ &/
}

Edit: I think Ast *parseUnaryExpr(Cctrl *cc); would possibly be the culpit for the expression

Jamesbarford avatar Mar 02 '25 19:03 Jamesbarford