Handle multiline strings
Multiline strings are allowed in Rust (playground link), however we currently do not handle them correctly:
test.rs:2:26: error: unended string literal
2 | let a = "whaaaaaat up
| ^
This is the beginning of a patch to fix that, basically commenting the checks for a \n character:
diff --git a/gcc/rust/lex/rust-lex.cc b/gcc/rust/lex/rust-lex.cc
index ecf151dc778..c51b00fb5fe 100644
--- a/gcc/rust/lex/rust-lex.cc
+++ b/gcc/rust/lex/rust-lex.cc
@@ -1917,7 +1917,7 @@ Lexer::parse_string (Location loc)
int length = 1;
current_char32 = peek_codepoint_input ();
- while (current_char32.value != '\n' && current_char32.value != '"')
+ while (/* current_char32.value != '\n' && */ current_char32.value != '"')
{
if (current_char32.value == '\\')
{
@@ -1949,14 +1949,15 @@ Lexer::parse_string (Location loc)
current_column += length;
- if (current_char32.value == '\n')
- {
- rust_error_at (get_current_location (), "unended string literal");
- // by this point, the parser will stuck at this position due to
- // undetermined string termination. we now need to unstuck the parser
- skip_broken_string_input (current_char32.value);
- }
- else if (current_char32.value == '"')
+ // if (current_char32.value == '\n')
+ // {
+ // rust_error_at (get_current_location (), "unended string literal");
+ // // by this point, the parser will stuck at this position due to
+ // // undetermined string termination. we now need to unstuck the parser
+ // skip_broken_string_input (current_char32.value);
+ // }
+ if (current_char32.value == '"')
+ // else if (current_char32.value == '"')
{
current_column++;
However, that code is necessary for properly handling some documentation attributes, as pointed out by various test cases in our testsuite.
rustc does this in a different pass rather than the lexer, which is what I think we should do as well. We could for example add that check after parsing a doc_attr.
Here is the relevant rustc code which checks for certain characters:
if let Some(c) = doc_alias
.chars()
.find(|&c| c == '"' || c == '\'' || (c.is_whitespace() && c != ' '))
{
self.tcx
.sess
.struct_span_err(
meta.span(),
&format!(
"{:?} character isn't allowed in `#[doc(alias = \"...\")]`",
c,
),
)
.emit();
return false;
}
This issue is necessary for compiling certain versions of libcore properly, which do contain multiline strings.
As a side-note, I haven't been able to understand the new system which can emits errors based on locale. I'll have to ask on the Rust zulip for an explanation or a PR link, as I couldn't figure out where that error was emitted without checking out the 1.49 release
The error is emitted as tcx.sess.emit_err(errors::DocAliasBadChar { span, attr_str, char_: c }); where DocAliasBadChar is defined in compiler/rustc_passes/src/errors.rs as
#[derive(SessionDiagnostic)]
#[error(passes::doc_alias_bad_char)]
pub struct DocAliasBadChar<'a> {
#[primary_span]
pub span: Span,
pub attr_str: &'a str,
pub char_: char,
}
The actual error message is declared in compiler/rustc_error_messages/locales/en-US/passes.ftl as passes-doc-alias-bad-char = {$char_} character isn't allowed in {$attr_str}. The PR implementing this is https://github.com/rust-lang/rust/pull/95512.
I found the error message but couldn't figure out the Diagnostic or how it was emitted. Thanks a lot @bjorn3 :DD