csharplang
csharplang copied to clipboard
[Proposal]: Raw string literal
Raw string literal
- [x] Proposed
- [x] Prototype: No prototype needed.
- [x] Implementation: In: https://github.com/dotnet/roslyn/tree/features/RawStringLiterals
- [x] Specification: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/raw-string-literal.md
Summary
Allow a new form of string literal that starts with a minimum of three """
characters (but no maximum), optionally followed by a new_line
, the content of the string, and then ends with the same number of quotes that the literal started with. For example:
var xml = """
<element attr="content"/>
""";
Spec: https://github.com/dotnet/csharplang/blob/main/proposals/raw-string-literal.md
Special thanks to @jnm2 for a deep review of this proposal
is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?
string exampleJson= $"""
{{
"name" = "{this.thingName}"
}}""";
(expecting that the answer is 'no - raw means raw with no interpolation' )
is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?
I personally think our existing strings are good enough for the template literal case. This is only for raw strings, and is intended when you just want to take a real snippet of some other language out there and put it into C#.
The moment we allow things like {
to mean something, then we run into the escaping problem again. You'll have the problem that doing this, along with json, will be just as painful as the past.
So, tbh, i believe this should just be for raw-strings. And the best way to handle that is to make sure that you can provide a literal that will never conflict with the contents, and that the contents can't ever have meaning. :)
In my opinion, Example 2 should trow an error and is confusing. The ending string literal must be in its own line. So the string doesn't end with a new line as also it doesn't start with one. If you want an empty line at the end, add an empty line. Perhaps this makes it also easier for the parser.
"Perhaps this makes it also easier for the parser." @Tragen agreed, that would also allow strings of quotes to appear mid-string, however how would you indicate if the block of text ends in a new line or not?
Thats easy. Add an empty line
No empty line at the end. Last character in the string is >
var xml = """
<element attr="contents">
<body>
</body>
</element>
""";
with empty line at the end
var xml = """
<element attr="contents">
<body>
</body>
</element>
""";
For me, that is much more intuitive and logical
is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?
I personally think our existing strings are good enough for the template literal case. This is only for raw strings, and is intended when you just want to take a real snippet of some other language out there and put it into C#.
The moment we allow things like
{
to mean something, then we run into the escaping problem again. You'll have the problem that doing this, along with json, will be just as painful as the past.So, tbh, i believe this should just be for raw-strings. And the best way to handle that is to make sure that you can provide a literal that will never conflict with the contents, and that the contents can't ever have meaning. :)
Actually, I believe the escalating problem is about double-quote marks specifically. Having a raw-string+interpolation should therefore be possible and useful for at least HTML,XML and non-C markup/languages, but this is something that can be deferred for the future.
Examples:
example[0] = $"""<a href="{url}">{label}</a>"""
example[1] = $"""<tiger age="{age}"><eyes colour="{eye_color}" count="2"></tiger>"""
(I am using a single-line mode for these examples for brevity)
Examples with raw strings that would have braces:
var templateName = "C# Example Generator";
$"""
void Example(string Name)
{{
Console.WriteLine($"Hello {{Name}}, Welcome to {templateName}");
}}
"""
Although the braces still need escaping, the ability to include raw double-quotes makes this much easier to read.
is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?
I personally think our existing strings are good enough for the template literal case. This is only for raw strings, and is intended when you just want to take a real snippet of some other language out there and put it into C#.
The moment we allow things like
{
to mean something, then we run into the escaping problem again. You'll have the problem that doing this, along with json, will be just as painful as the past.So, tbh, i believe this should just be for raw-strings. And the best way to handle that is to make sure that you can provide a literal that will never conflict with the contents, and that the contents can't ever have meaning. :)
I don't see any conceptual paradox in having single-line and multi-line raw string literals with de-indentation.
example[0] = """raw string here"""; //closing quote is found on the same-line, so there is no multi-line processing to do
example[1] = """multiline string here
with no de-indentation, because
the string opener was not followed by new-line""";
example[2] = """
this string can be de-indented
because the string opener
was directly followed by new-line"""; //it shouldn't matter if the string closer is here, or on the following line, the first line's indentation is the reference-point.
example[3] = """
this also means, that indentation
can increase above the base-line
the same amount of spaces are
still removed according to the base-line
"""; //even if the string closer has zero indent
Perhaps it isn't impossible to implement, but it would be much more complex or the spec-system isn't flexible enough?
@Tragen
In my opinion, Example 2 should trow an error and is confusing. The ending string literal must be in its own line. So the string doesn't end with a new line as also it doesn't start with one. If you want an empty line at the end, add an empty line. Perhaps this makes it also easier for the parser.
I disagree. There's nothing confusing about the closing quote being on the same line. It's exactly how text blocks work in Python and Java and it is not a problem in either of those languages.
@HaloFour A lot of other languages disagree with you. When you can have it on the same line, then you would have an empty line at the beginning in all of the examples in the first post.
@Tragen
Other languages are welcome to do what they wish, but given two major languages have adopted the behavior proposed above it demonstrates that there is nothing inherently confusing about it.
Because major languages have some features doesn't automatically mean that it isn't confusing. E.g. C++ is very confusing.
@Tragen
Many different ways to skin that cat. To be honest I kind of prefer C++'s general approach to raw strings over text blocks since you're given a lot of flexibility to customize the delimiters while still retaining the syntax of a string (unlike heredocs in many languages). See the syntax I originally proposed here: https://github.com/dotnet/csharplang/discussions/89
I will admit that having the closing delimiter on a separate line does make it easier to control the indentation without including that final newline character, and Cyrus was a little surprised that Java does include that newline when the delimiter is on the next line (so does Python).
Because major languages have some features doesn't automatically mean that it isn't confusing.
It does help with the argument though. Ultimately, either approach will need to be learned. Given that this doesn't really seem to have been a problem for many other languages, I'm not too worried for us. That said, I'm certain we'll discuss that option when we design this.
For me, that is much more intuitive and logical
I'm certain we'll discuss this during the design process.
I don't see any conceptual paradox in having single-line and multi-line raw string literals with de-indentation.
I'm certain we'll discuss this during the design process.
Although the braces still need escaping,
We'll likely discuss this. Though I'm personally against it. It will depend on what he rest of the ldm wants here.
Needing to escape defeats the purpose here. Once you have to escape something, you're back where you started. The goal of these strings was to allow you to embed any content and not have to deal with escaping at all.
There's a conflation of two different issues here:
- Supporting the ability to define raw straw string literals which require no escaping.
- Trimming indentation whitespace from literals.
I don't see that they necessarily have to come packaged together.
For example I would often want to indent interpolated strings as well.
It's also not clear how often raw literals have to be constants, and can't afford the overhead of calling something like .TrimIndentation()
on them. I imagine the main use case would be tests, where such overhead would be marginal.
It's also not clear how often raw literals have to be constants, and can't afford the overhead of calling something like .TrimIndentation() on them
My position is that that's what would be wanted the majority of times. As such, doing it by default should just be how the language works. Why foist it on the user to have to add that extra work when it can just be the default oob behavior?
I don't see that they necessarily have to come packaged together.
They don't. But if we do raw strings this, I think we might as well do both to allow the literals to be ergonomically formatted without any downsides.
I'm sure though that we'll discuss this in the design meetings.
@YairHalberstadt
Java went through a similar design process and initially considered them separate with the inclusion of a helper method to align and trim the incidental whitespace. That was found to be more confusing and unattractive. Furthermore, since the helper method at runtime had less information regarding the formatting of the source around the string it ended up being necessary to include sentinel characters within the String to help inform it as to where the margin was supposed to be.
See: https://openjdk.java.net/jeps/326
I agree with Cyrus, the margin trimming is the most common thing you'd want to do and it's trivial to manage how the compiler behaves by the positioning of the delimiters. IDEs can include visual hints as to where the margin will be (as IntelliJ does with Java).
IDEs can include visual hints as to where the margin will be
Yes. I intend to do this as part of the implementation.
This is great. By the time I got to the examples, they were already doing everything I intuitively wanted them to be doing. The indentation removal (or lack of indentation inclusion) is excellent and I would like to use it for things like EF/Dapper SQL queries.
I like the fact that you can explicitly include or exclude an ending newline by putting """
on the same line as the last line vs putting it on the next line. If there was a totally blank line before the ending """
, I would strongly intuit that there would be two ending newlines. On the other hand, I could get used to anything. A newline is excluded at the top every time already.
There are a bunch of cases where I'd love to be able to use interpolation together with not having to escape double quote characters. For example: https://github.com/nunit/nunit3-vs-adapter/blob/master/src/NUnit.TestAdapter.Tests.Acceptance/SinglePassingTestResultTests.cs#L47-L60 Using raw strings without interpolation just for the benefit of excluding indentation and not having to escape quotes is probably something that would be quite hard to read if you have to inject values.
Any thoughts on line endings? I once saw a case where a multi-line text was used in a unit test. As the source code was on git and autocrlf was set to true the string had different line endings when compiled on linux vs windows leading to different behaviors.
I personnally think it would be cool to have a way to specify what line endings the string should have and not rely on line endings of the file itself.
Any thoughts on line endings? I once saw a case where a multi-line text was used in a unit test. As the source code was on git and autocrlf was set to true the string had different line endings when compiled on linux vs windows leading to different behaviors.
I personnally think it would be cool to have a way to specify what line endings the string should have and not rely on line endings of the file itself.
This is worthy of consideration. Some prefix sign ahead of the string (as we have $ and @ now) perhaps? I was also thinking a lot of "tabs to/from spaces" converters may need to get smarter here too. Having the IDE clearly indicate the common indent and show if it has a mix of tabs and spaces in it would be very helpful.
Allowing string interpolation seems reasonable to me. This does reduce the "can paste anything" ability, and more makes it an easier way to include blocks of text with quotes in them. However that seems a reasonable trade-off as it's very opt-in (only works if user places the $ in front)
Finally, the original proposal to have the closing quotes on their own line seems sensible to me. Imagine overwrite-pasting a good chunk of text - so much easier to select whole lines than to select many lines and then all bar the last N characters of the last line. I would much prefer to have the closing quotes on their own line.
Any thoughts on line endings?
I would preserve them as is. It's intentionally a raw string, not an interpreted one. :-)
As the source code was on git and autocrlf was set to true the string had different line endings when compiled on linux vs windows leading to different behaviors.
Sounds like a problem for all strings. Don't do that :-D
would preserve them as is.
so different behavior based on what OS the code is build on?
so different behavior based on what OS the code is build on?
No. I would preserve them as is. So whatever the contents of the file are. Do not use auto-crlf. It's unnecessary and outright broken.
The two ecosystems have tools that are fine with either line ending. Having your source control tool messing with this just isn't a good idea.
so different behavior based on what OS the code is build on?
No. I would preserve them as is. So whatever the contents of the file are. Do not use auto-crlf. It's unnecessary and outright broken.
The two ecosystems have tools that are fine with either line ending. Having your source control tool messing with this just isn't a good idea.
I agree with auto-crlf being dodgy, although it is very popular.
Visual Studio itself suggests to me occasionally that I have mixed line endings in files and suggests to fix them. I suppress that message - we're a small team and 99% of the time just use Visual Studio. However those mixed line endings still sneak in and it'd be tricky if they were to cause confusion in things like string lengths being different.
For my use cases that come to mind it wouldn't hurt me but being sure of it would be nice.
Visual Studio itself suggests to me occasionally that I have mixed line endings in files and suggests to fix them. I suppress that message - we're a small team and 99% of the time just use Visual Studio. However those mixed line endings still sneak in and it'd be tricky if they were to cause confusion in things like string lengths being different.
Honestly, if you have intentional mixed newlines, the right solution there IMO is simply to be explicit with line endings. i.e. actually use real escapes like \r\n
. This is explicit, safe, and always resilient to whatever tooling you have in your development stack.
can you include ability to add designator of language ?
var s = """SQL
select * from TX
""";
then tooling (VS) will colorize content, and maybe other features.
What if content include many " ? Maybe better is to use combinatorial explosion to make delimiters shorter:
var s = """3
""""""""""1""""""""2222""""""""""
"""3;
Delimiter """ with following number, can;t be included in content, It is way better than """"""""""".