cppfront icon indicating copy to clipboard operation
cppfront copied to clipboard

[BUG] Parsing bug when multiplying a negative number

Open mike919192 opened this issue 1 year ago • 7 comments

Describe the bug Multiplying by a negative number is being parsed incorrectly. Putting the negative number in parenthesis is a workaround to parse it correctly.

To Reproduce This cpp2 code:

#include <iostream>
#include <string>

main: () = {
    
    std::cout << 1 * -1 << name() << "\n";
    std::cout << 1 * (-1) << name() << "\n";
}

Compiles to this cpp code


//=== Cpp2 type declarations ====================================================


#include "cpp2util.h"



//=== Cpp2 type definitions and function declarations ===========================

#include <iostream>
#include <string>

auto main() -> int;

//=== Cpp2 function definitions =================================================


auto main() -> int{

    std::cout << *cpp2::impl::assert_not_null(1) - 1 << name() << "\n";
    std::cout << 1 * (-1) << name() << "\n";
}

https://godbolt.org/z/KMc6sxoPe

Additional context This is present in the head, but not in the release 0.7.0

mike919192 avatar May 12 '24 22:05 mike919192

I'll take a look at this issue.

sookach avatar May 16 '24 00:05 sookach

Sorry for the late response, and thank you for the reporting this bug. Not sure if you could already tell, but essentially we're parsing the first expression as a dereference of '1' minus '1' which is not correct. The reason the next line doesn't have the same mistake is because the parser has a specific rule for disallowing a postfix star to be treated as a dereference if followed by a left parenthesis as below: https://github.com/hsutter/cppfront/blob/d09b052c130d509d747a114dbb6814eb08a804ec/source/parse.h#L5798

The reason it works in version 0.7.0 but not head is because of commit 5663493 which was published shortly after the release.

@hsutter From what I can tell, in order to fix this issue we would need to change the parser behavior (which would break the code that prompted the original commit).

We can't just add:

MINUS LITERAL PLUS LITERAL

to the list of token(s) that can't follow a unary star, because even though it would fix this specific issue, that rule would prevent valid binary expressions involving a dereference, and it would also require

EXCLAIM+ LITERAL

to be added to the list of prohibited successor token(s) which would introduce an unbounded lookahead since any amount of exclaims can follow in each other in sequence.

Curious what you think the best course of action is, since it seems we will need to change the language's behavior.

sookach avatar May 17 '24 21:05 sookach

Edited to add: I think the following may also address the way the first line parses, because then like the second line it will see interpreting it as a postfix operator doesn't work and backtrack and try multiplication? Need to try it.

Thanks! I'm just in the middle of another bit where I'm not in a compilable state, but I would try this in parse.h:5798 (this is notes for myself, or in case someone wants to try it feel free):

-           //  * and & can't be unary operators if followed by a (, identifier, or literal
+           //  * and & can't be unary operators if followed by a (, identifier, prefix operator, or literal
            if (
                (
                    curr().type() == lexeme::Multiply
                    || curr().type() == lexeme::Ampersand
                    )
                && peek(1)
                && (
                    peek(1)->type() == lexeme::LeftParen
                    || peek(1)->type() == lexeme::Identifier
                    || is_literal(peek(1)->type())
+                   || is_prefix_operator(*peek(1))
                    )
                )
            {
                break;
            }

hsutter avatar May 17 '24 23:05 hsutter

Thanks @hsutter. So, the change you propose does indeed fix this issue, but as I suspected:

that rule would prevent valid binary expressions involving a dereference

it introduces another issue. Consider the following code:

x := new<int>(1);
std::cout << x* - 1 << "\n";

this now translates to

auto x {cpp2_new<int>(1)}; 
std::cout << cpp2::move(x) * -1 << "\n";

whereas before it would translate correctly.

sookach avatar May 18 '24 00:05 sookach

Ah, good point. I'll think about this more. Thanks again for the report and the notes!

hsutter avatar May 18 '24 00:05 hsutter

got another example (beside #1216) that breaks on current master:

main: () = {
    cx : i32 = 10;
    _ = float(cx); // works fine
    _ = PI / float(cx); // works fine too
    _ = float(PI) / float(cx); // works
    _ = PI * cx; // works fine
    _ = float(PI) * cx; // works

    // break with: error: invalid statement encountered inside a compound-statement
    _ = PI * float(cx);
    _ = float(PI) * float(cx);
}

Green-Sky avatar Aug 20 '24 09:08 Green-Sky

got another example (beside #1216) that breaks on current master:

Thank you for the report.

sookach avatar Aug 20 '24 10:08 sookach