cppfront icon indicating copy to clipboard operation
cppfront copied to clipboard

[BUG] Cppfront can not parse "\x{62}"

Open MaxSagebaum opened this issue 1 year ago • 8 comments

Cppfront produces the error:

main.cpp2(3,22): error: string literal "\" is missing its closing "
main.cpp2(5,32): error: local variable c cannot be used in its own initializer

while parsing ""\x{62}blub""

A reproducer on compiler explorer: https://cpp2.godbolt.org/z/h6e8vc1rv

One possible patch is:

diff --git a/source/lex.h b/source/lex.h
index bab314a..ba3be0e 100644
--- a/source/lex.h
+++ b/source/lex.h
@@ -939,10 +939,16 @@ auto lex_line(
         if (
             peek(  offset) == '\\'
             && peek(1+offset) == 'x'
-            && is_hexadecimal_digit(peek(2+offset))
+            && (is_hexadecimal_digit(peek(2+offset))
+              || (peek(2+offset) == '{' && is_hexadecimal_digit(peek(3+offset)))
             )
+        )
         {
+            bool has_bracket = peek(2+offset) == '{';
             auto j = 3;
+
+            if (has_bracket) { ++j; }
+
             while (
                 peek(j+offset)
                 && is_hexadecimal_digit(peek(j+offset))
@@ -950,6 +956,14 @@ auto lex_line(
             {
                 ++j;
             }
+
+            if (has_bracket) {
+                if (peek(j+offset) == '}') {
+                    ++j;
+                } else {
+                    return 0;
+                }
+            }
             return j;
         }
         return 0;

It seems that there is currently no regression test for the escapes.

MaxSagebaum avatar Jul 08 '24 08:07 MaxSagebaum

Created a pull request https://github.com/hsutter/cppfront/pull/1153.

MaxSagebaum avatar Jul 08 '24 08:07 MaxSagebaum

Thanks! TIL:

can not parse "\x{62}"

I wasn't familiar with that extension, and after some spelunking discovered it was added for C++23 by paper P2290R3.

It seems to be accepted by GCC 13.1 or higher and Clang 14 or higher (and they backported it all the way to -std=c++98 mode). It does not seem to work at all on MSVC though.

I think there is an even better alternative fix though: C++23 "\x{62}blub" has the identical meaning as C++98 "\x62" "blub", so if I (finally) supported string literal concatenation, that would give you a way to write this. Would that solve your problem?

hsutter avatar Jul 08 '24 20:07 hsutter

Yes, that would also solve my problem.

We could enable it just for clang and gcc and enable it when MSVC supports it.

MaxSagebaum avatar Jul 08 '24 21:07 MaxSagebaum

On a second thought, it would solve the problem in cppfront but depending on how cppfront outputs concatinated strings I would then have the problem in the cpp1 compiler.

  • If concat strings are output as concat strings everything is fine.
  • If concat strings are output as one large string I would get problems in the cpp compiler.

I have also the option to leave the escape sequence there in the generated string, but having the escape char in a source file is not really feel very good.

MaxSagebaum avatar Jul 08 '24 21:07 MaxSagebaum

If concat strings are output as concat strings everything is fine.

My thought about that was that concat strings that don't use interpolations would be emitted as concat strings in Cpp1.

Alternatively, if we did use P2290 delimited escape sequences, would those require a C++23-era compiler? Or would cppfront need to interpret the escape and replace it with the escaped character in the generated file in order to still work downlevel (and is that always legal)?

I have also the option to leave the escape sequence there in the generated string, but having the escape char in a source file is not really feel very good.

Right, but would doing that for now work sufficiently to unblock you while waiting for this issue to be resolved?

Thanks again for reporting this, and TIL about P2290!

hsutter avatar Jul 08 '24 21:07 hsutter

I think handling them transparently (meaning recognizing them as valid and simply outputting them unchanged) is probably the safest option. Then to the user, don't use them unless your cpp1 compiler also supports them. That would mean that those using cppfront with a supporting cpp1 compiler could use them, but cpp2 itself wouldn't use them as not all the cpp1 compilers that it tests with support them.

gregmarr avatar Jul 09 '24 20:07 gregmarr

This is actually a valid point. So to summarize:

  • Cppfront forwards \x{...} escapes
  • Cppfront code is not allowed to use it since not all targeted compilers support it.

MaxSagebaum avatar Jul 10 '24 08:07 MaxSagebaum

Correct, and slightly finer-grained: Cppfront couldn't use it internally because cppfront has to be compilable on pre-P2290 compilers, and cppfront couldn't emit it (unless via passthrough if originally authored by the user) unless the user actually wrote it in their code.

hsutter avatar Jul 10 '24 21:07 hsutter