Macros juxtaposed with strings incorrectly(?) get raw string literal
If you directly juxtapose a macro with a string, the macro is supplied a raw literal version of the string as if the macro were a string macro, even though it's not.
Here is an example:
julia> macro m(x)
return esc(x)
end
@m (macro with 1 method)
julia> @m "hey $(2+2)"
"hey 4"
julia> @m"hey $(2+2)"
"hey \$(2+2)"
julia> m"hey $(2+2)"
ERROR: LoadError: UndefVarError: `@m_str` not defined
in expression starting at REPL[4]:1
I think this is a bug. On julia 1.6, which I happen to have installed and which doesn't yet have JuliaSyntax, we get this error instead:
julia> VERSION
v"1.6.7"
julia> @m"hey $(2+2)"
ERROR: syntax: invalid macro usage "@(@m_str # REPL[4], line 1 "hey $(2+2)")"
Stacktrace:
[1] top-level scope
@ none:1
I believe the mistake is that the macro is being parsed as if it's a string macro, since the input matches what a string macro would see:
julia> macro m_str(x)
return esc(x)
end
@m_str (macro with 1 method)
julia> m"hey $(2+2)"
"hey \$(2+2)"
I see the current behavior on both julia 1.10 and 1.12:
julia> versioninfo()
Julia Version 1.12.0-DEV.1173
Commit 169e9e8de1* (2024-09-09 15:10 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin23.5.0)
CPU: 12 × Apple M2 Max
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
JULIA_SSL_CA_ROOTS_PATH =
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 12 × Apple M2 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
JULIA_SSL_CA_ROOTS_PATH =
This is due to a lexing difficulty: we need to know which strings are raw while lexing, and we use lexer state to guess (ie, a heuristic based on the previous token). In this case, it fails so we need to fix that.
julia> collect(JuliaSyntax.Tokenize.tokenize("@m\"str\$x\""))
6-element Vector{JuliaSyntax.Tokenize.RawToken}:
0-0 @
1-1 Identifier
2-2 "
3-7 String
8-8 "
9-8 EndMarker
julia> collect(JuliaSyntax.Tokenize.tokenize("@m \"str\$x\""))
9-element Vector{JuliaSyntax.Tokenize.RawToken}:
0-0 @
1-1 Identifier
2-2 Whitespace
3-3 "
4-6 String
7-7 $
8-8 Identifier
9-9 "
10-9 EndMarker
I think it'd be good enough to track the two previous tokens and check whether one was an @ - this is probably good enough in practice.
(Unfortunately, Julia also allows syntax like @A.B.C.x"str" to mean A.B.C.@x"str" and making that also work would need feedback from the parser state to the lexer (uuugh!) but I really feel the @ in front of the module name is horrible syntax and should be deprecated!)
and making that also work would need feedback from the parser state to the lexer (uuugh!)
😢
🤔 I actually think you could leave the Lexer as-is. Given the above tokens, i think we could still raise an Exception later on in parsing/lowering from the juxtaposed macro call and the string? In other words, it's okay that we "incorrectly" parsed a raw-string, since we're going to throw an error later for the lack of whitespace?
That would be consistent with the 1.6 behavior:
julia> @m"hey $(2+2)"
ERROR: syntax: invalid macro usage "@(@m_str # REPL[4], line 1 "hey $(2+2)")"
I think you could parse
0-0 @
1-1 Identifier
2-2 "
3-7 String
8-8 "
9-8 EndMarker
into either of these expressions, which could both error?:
@(Identifier"String") # This seems to be what they parsed in 1.6, where Identifier"String" lowers into `@Identifier_str"String"`
(@Identifier"String") # We could just disallow juxtaposing a macrocall with a string?
It seems like both of those approaches would be robust to qualified names?
But that said:
but I really feel the @ in front of the module name is horrible syntax and should be deprecated!)
+1