JuliaSyntax.jl Macros juxtaposed with strings incorrectly(?) get raw string literal

If you directly juxtapose a macro with a string, the macro is supplied a raw literal version of the string as if the macro were a string macro, even though it's not.

Here is an example:

julia> macro m(x)
           return esc(x)
       end
@m (macro with 1 method)

julia> @m "hey $(2+2)"
"hey 4"

julia> @m"hey $(2+2)"
"hey \$(2+2)"

julia> m"hey $(2+2)"
ERROR: LoadError: UndefVarError: `@m_str` not defined
in expression starting at REPL[4]:1

I think this is a bug. On julia 1.6, which I happen to have installed and which doesn't yet have JuliaSyntax, we get this error instead:

julia> VERSION
v"1.6.7"

julia> @m"hey $(2+2)"
ERROR: syntax: invalid macro usage "@(@m_str # REPL[4], line 1 "hey $(2+2)")"
Stacktrace:
 [1] top-level scope
   @ none:1

I believe the mistake is that the macro is being parsed as if it's a string macro, since the input matches what a string macro would see:

julia> macro m_str(x)
           return esc(x)
       end
@m_str (macro with 1 method)

julia> m"hey $(2+2)"
"hey \$(2+2)"

I see the current behavior on both julia 1.10 and 1.12:

julia> versioninfo()
Julia Version 1.12.0-DEV.1173
Commit 169e9e8de1* (2024-09-09 15:10 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin23.5.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_SSL_CA_ROOTS_PATH =

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_SSL_CA_ROOTS_PATH =

Sep 27 '24 00:09 NHDaly

This is due to a lexing difficulty: we need to know which strings are raw while lexing, and we use lexer state to guess (ie, a heuristic based on the previous token). In this case, it fails so we need to fix that.

julia> collect(JuliaSyntax.Tokenize.tokenize("@m\"str\$x\""))
6-element Vector{JuliaSyntax.Tokenize.RawToken}:
 0-0        @              
 1-1        Identifier     
 2-2        "              
 3-7        String         
 8-8        "              
 9-8        EndMarker      

julia> collect(JuliaSyntax.Tokenize.tokenize("@m \"str\$x\""))
9-element Vector{JuliaSyntax.Tokenize.RawToken}:
 0-0        @              
 1-1        Identifier     
 2-2        Whitespace     
 3-3        "              
 4-6        String         
 7-7        $              
 8-8        Identifier     
 9-9        "              
 10-9       EndMarker

I think it'd be good enough to track the two previous tokens and check whether one was an @ - this is probably good enough in practice.

(Unfortunately, Julia also allows syntax like @A.B.C.x"str" to mean A.B.C.@x"str" and making that also work would need feedback from the parser state to the lexer (uuugh!) but I really feel the @ in front of the module name is horrible syntax and should be deprecated!)

Oct 07 '24 11:10 c42f

and making that also work would need feedback from the parser state to the lexer (uuugh!)

😢

Oct 07 '24 11:10 KristofferC

🤔 I actually think you could leave the Lexer as-is. Given the above tokens, i think we could still raise an Exception later on in parsing/lowering from the juxtaposed macro call and the string? In other words, it's okay that we "incorrectly" parsed a raw-string, since we're going to throw an error later for the lack of whitespace?

That would be consistent with the 1.6 behavior:

julia> @m"hey $(2+2)"
ERROR: syntax: invalid macro usage "@(@m_str # REPL[4], line 1 "hey $(2+2)")"

I think you could parse

 0-0        @              
 1-1        Identifier     
 2-2        "              
 3-7        String         
 8-8        "              
 9-8        EndMarker

into either of these expressions, which could both error?:

@(Identifier"String")   # This seems to be what they parsed in 1.6, where Identifier"String" lowers into `@Identifier_str"String"`

(@Identifier"String")   # We could just disallow juxtaposing a macrocall with a string?

It seems like both of those approaches would be robust to qualified names?

Oct 09 '24 21:10 NHDaly

But that said:

but I really feel the @ in front of the module name is horrible syntax and should be deprecated!)

+1

Oct 09 '24 21:10 NHDaly