cppfront icon indicating copy to clipboard operation
cppfront copied to clipboard

[SUGGESTION][FIX] Add support for raw string literals in mixed mode

Open filipsajdak opened this issue 3 years ago • 8 comments

Current implementation in mixed mode is not parsing raw string literals well.

Cpp2 source code

auto r = R"(
    this is raw string literal
    \n
    /*   */
    //
i: int = 42;
)";

/*
 *
 *   this is a comment
 *
 */

main: () -> int = {
    std::cout << "Hello world\n\n" << std::quoted(r) << std::endl;
}

Compiles to

// ----- Cpp2 support -----
#include "cpp2util.h"

#line 1 "/Users/filipsajdak/dev/execspec/external/tests/raw_string.cpp2"

auto r = R"(
    this is raw string literal
    \n
    /*   */
    //

#line 7 "/Users/filipsajdak/dev/execspec/external/tests/raw_string.cpp2"
#line 8 "/Users/filipsajdak/dev/execspec/external/tests/raw_string.cpp2"
)";

#line 16 "/Users/filipsajdak/dev/execspec/external/tests/raw_string.cpp2"
[[nodiscard]] auto main() -> int;

//=== Cpp2 definitions ==========================================================

#line 7 "/Users/filipsajdak/dev/execspec/external/tests/raw_string.cpp2"
int i { 42 }; 
#line 9 "/Users/filipsajdak/dev/execspec/external/tests/raw_string.cpp2"

/*
 
    this is a comment
 
 */

[[nodiscard]] auto main() -> int{
    std::cout << "Hello world\n\n" << std::quoted(r) << std::endl;
}

The proposed change makes cppfront parse raw string literals in a similar way as it parses /* */ comments. After this change cppfront produces:

// ----- Cpp2 support -----
#include "cpp2util.h"

#line 1 "/Users/filipsajdak/dev/execspec/external/tests/raw_string.cpp2"

auto r = R"(
    this is raw string literal
    \n
    /*   */
    //
i: int = 42;
)";

#line 16 "/Users/filipsajdak/dev/execspec/external/tests/raw_string.cpp2"
[[nodiscard]] auto main() -> int;

//=== Cpp2 definitions ==========================================================

#line 9 "/Users/filipsajdak/dev/execspec/external/tests/raw_string.cpp2"

/*
 
    this is a comment
 
 */

[[nodiscard]] auto main() -> int{
    std::cout << "Hello world\n\n" << std::quoted(r) << std::endl;
}

Debug info shows that the lines are recognized as /* R */ which was added symbol to mark raw string literals lines.

/*   */ 
/* 1 */ auto r = R"(
/* R */     this is raw string literal
/* R */     \n
/* R */     /*   */
/* R */     //
/* R */ i: int = 42;
/* 1 */ )";
/* 2 */ 
/* 2 */ /*
/* 2 */  *
/* 2 */  *   this is a comment
/* 2 */  *
/* 2 */  */
/* 2 */ 
/* 2 */ main: () -> int = {
/* 2 */     std::cout << "Hello world\n\n" << std::quoted(r) << std::endl;
/* 2 */ }

Close https://github.com/hsutter/cppfront/issues/62

filipsajdak avatar Oct 11 '22 17:10 filipsajdak

In the last push, I have added also support for d-char-sequence from spec:

prefix(optional) R"d-char-sequence(optional)
(r-char-sequence(optional))d-char-sequence(optional)"

The only thing I did not check yet is the prefix.

The current implementation can handle:

auto r = R"abc(
    this is raw string literal
    \n
    /*   */
    R"(this should be ignored)"
    R"ab(this should be also ignored)ab"
    //
i: int = 42;
)abc";

filipsajdak avatar Oct 11 '22 21:10 filipsajdak

I have tested the prefixes and it seems to work, too.

filipsajdak avatar Oct 11 '22 21:10 filipsajdak

I actually completely forgot about raw string literals... thanks for doing this. At first blush this looks good, before merging let me take a deeper look with a fresh brain and a fresh cup of coffee...

Did you check the three /passthrough-tests files to make sure they still are diff-identical when passed through cppfront?

hsutter avatar Oct 18 '22 00:10 hsutter

No, unfortunately I did not do this.

Will check that.

filipsajdak avatar Oct 18 '22 04:10 filipsajdak

I think https://github.com/modern-cmake/cppfront does that.

JohelEGP avatar Oct 18 '22 04:10 JohelEGP

I have rebased my changes to the main and done the passthrough tests.

clang & GCC files are the same.

MSVC ends with the error:

cppfront % ../../build/external/cppfront passthrough-tests/msvc-msstl-e.cpp2      
passthrough-tests/msvc-msstl-e.cpp2...
msvc-msstl-e.cpp2(174324,0): error: closing } does not match a prior {

Need to check what caused the error.

filipsajdak avatar Oct 18 '22 12:10 filipsajdak

OK, found a bug and fix it.

The issue was here

diff --git a/source/load.h b/source/load.h
index 71fd857..9954a02 100644
--- a/source/load.h
+++ b/source/load.h
@@ -248,7 +248,7 @@ auto process_cpp_line(
                 return r;
             }
             in_raw_string_literal = false;
-            i += end_pos+d_char_sequence.size()-1;
+            i = end_pos+d_char_sequence.size()-1;
         }
         else {
             r.all_comment_line = false;

Small but nasty mistake. Now all files compiles and generate identical output.

filipsajdak avatar Oct 18 '22 17:10 filipsajdak

Rebased to main - all passthrough-tests and regression-tests passed.

filipsajdak avatar Oct 30 '22 19:10 filipsajdak

I tried this out with the passthrough-tests, starting with running cppfront msvc-msstl-e.cpp2, but it crashed.

Running it under the debugger, I see that the crash is an out-of-bounds access attempt in the second-last line in process_cpp_line:

prev = line[i];

where i was about 20-30 higher than line.size().

I didn't investigate further, but this should give you a repro and a roadmap to find the bug... can you take a look?

hsutter avatar Dec 11 '22 01:12 hsutter

Hmm... I am surprised that it crashed. I will investigate

filipsajdak avatar Dec 11 '22 09:12 filipsajdak

I have trouble reproducing the crash. I even have added an assert before the line that you have spotted (lines 304-306):

    assert(i < line.size());
    prev = line[i];
}

There was a bug that I fixed (https://github.com/hsutter/cppfront/pull/69#issuecomment-1282748401), and it is already pushed.

Can you check if line 251 uses = instead of +=? The correct line should look like this:

i = end_pos+raw_string_closing_seq.size()-1;

I have compiled my cppfront in clang

Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin22.1.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

@hsutter Can you tell me which compiler you have used?

filipsajdak avatar Dec 11 '22 16:12 filipsajdak

The interesting fact is that only MSVC is using raw string literals in its lib. It is always inline and finalised with the string_view suffix. E.g.:

_Result += R"(: ")"sv;

so d_char_sequence will be empty.

filipsajdak avatar Dec 11 '22 16:12 filipsajdak

@hsutter I have rebased my changes on the newest main and still cannot reproduce. I will probably need to check on other compilers.

filipsajdak avatar Dec 12 '22 00:12 filipsajdak

@hsutter I am confused. I have tried it on MSVC and it works as well. I will need more information To be able to reproduce it.

My very first guess is that it looks like the previous bug that I have fixed (check above). Some merge issues might also cause it.

filipsajdak avatar Dec 12 '22 20:12 filipsajdak

OK, I'll take another look. Thanks!

hsutter avatar Dec 12 '22 23:12 hsutter

@hsutter if it will not work, please send me the code you have via email I will verify what is wrong.

filipsajdak avatar Dec 13 '22 00:12 filipsajdak

Good news: It works fine!

Sorry for the noise: The test failure was almost certainly my error when I opened the branch in GitHub Desktop and must just not have had all the latest commits somehow. (I did try pulling the later commits, but there was a conflict and I'm just not familiar enough with switching to PR branches in GitHub Desktop, and I'm sadly not a git command-line person.)

So I have verified the PR by the time-honored method of manually applying it to my main branch and running the passthrough tests, and they work fine. So now I'll revert that and merge this PR. :) Thanks again.

hsutter avatar Dec 15 '22 00:12 hsutter

Happy to hear that it's working.

The good thing is that this situation motivated me to run sanitizers which helped in discovering another issue.

filipsajdak avatar Dec 15 '22 00:12 filipsajdak