ceylon icon indicating copy to clipboard operation
ceylon copied to clipboard

syntax for string interpolation

Open gavinking opened this issue 8 years ago • 70 comments

We have received many complaints about the syntax for string interpolation in Ceylon. The double-backtick syntax was chosen because:

  • double backticks are incredibly uncommon in regular text,
  • we thought it looked quite visually pleasing, and
  • it was very easy to lex, and therefore I figured it would be something that would cause less problems in the IDE.

In the end, the last item has not, to my mind, worked out anywhere near as well as I expected, and I think my reasoning on that was flawed.

Today I finally broke down, swallowed my pride and tried my hand at implementing something else. By wrapping the ANTLR token stream, and recursively lexing string tokens, I've been able to add support for the following syntax:

print("Hello, \(name)!");

Now, you were probably expecting this to be the more-common ${name} instead of the less-common \(name). So why did I go for something slightly less familiar?

Well, \ is already the escape character in strings, and $ is not. So this is backward compatible. Also, \{...} is already a syntax meaning a unicode escape sequence. So this is unambiguous.

Now, using the exact same technique, I could implement support for either ${ ... } or \{ ... } though the first would not be backward compatible, and the second would be a bit of a fiddle because the syntax would mean different things depending upon what occurs within the braces.

On the other hand, I think \( ... ) looks good.

I will push this to a branch, and I would like to hear some feedback.

gavinking avatar Sep 04 '17 15:09 gavinking

P.S. \(stuff) is what Swift uses, FTR.

gavinking avatar Sep 04 '17 15:09 gavinking

Just to play a devils advocate here - what is the proposed migration path from double backticks to this new syntax?

luolong avatar Sep 04 '17 16:09 luolong

@luolong I don't plan to remove the old syntax completely.

gavinking avatar Sep 04 '17 16:09 gavinking

If this is the time to break compatibility, then we should go with the more popular ${foo}.

chochos avatar Sep 04 '17 16:09 chochos

To be honest I really prefer the \() syntax. Not only does it use the already well-known \ character for escaping, but it also uses the familiar () for grouping expressions. ${} feels completely alien to the language; I don't have a strong preference towards either \() or `` when compared to each other, but I do prefer either over ${}.

ghost avatar Sep 04 '17 16:09 ghost

I don't have a strong preference towards either \() or ``

I guess I don't because I'm already used to ``; I think if I wasn't, I'd prefer \().

ghost avatar Sep 04 '17 17:09 ghost

I too would prefer the Swift style \() to anything with a $; reminds me of BASIC 😆.

fwgreen avatar Sep 04 '17 17:09 fwgreen

I kinda like ${} because it's used in other languages, but I suppose it would be just as easy to use \(). As long as we can get rid of those `` which are a PITA to type on azerty keyboards, I'm OK.

I have no strong feelings one way or the other, both options are good.

bjansen avatar Sep 04 '17 18:09 bjansen

@Zambonifofex distilled my feelings toward the issue, I think.

arseniiv avatar Sep 04 '17 18:09 arseniiv

I don't think it matters much, but, FTR, in order of easiness-to-type, I have:

  • \{}
  • \()
  • ${}

gavinking avatar Sep 04 '17 20:09 gavinking

Since this doesn't much impact any other code (it's basically just a new class that wraps CeylonLexer), and since in order to meaningfully try this out, you'll need IDE support, I've pushed my implementation c65a4cb to master.

Please try it and give me some feedback.

Note that this will have some impact on the performance of the scanner, and thus of syntax highlighting. However, from what I've seen, this won't be noticeable.

gavinking avatar Sep 04 '17 20:09 gavinking

I don't think it matters much, but, FTR, in order of easiness-to-type, I have:

\{} \() ${}

For readability, I find most to least readable:

  • \(name)
  • \{name}
  • ${name}

Because \ and () I find to present the least visual noise around the actual variable/expression in the interpolation, than $ and {}. This makes it visually easier with \() to immediately pick out the variable/expression in the interpolation.

lucono avatar Sep 04 '17 20:09 lucono

On my german Qwertz-keyboard, I have ` (and $, ()) reachable with just a shift, for {} or \ I need the AltGr modifier. So $() would be easiest to type ;-)

ePaul avatar Sep 04 '17 23:09 ePaul

If we are trying to make our language more in-line with its siblings, then ${} is definitely the way to go, I'm afraid. Changing one unique syntax to an extremely uncommon one won't be seen as an improvement by anyone regardless of what we personally prefer. I also think the fight between \() and ${} not worth alienating new users for.

FroMage avatar Sep 05 '17 11:09 FroMage

Well the problem with using ${} is it introduces a new character that must be escaped, and breaks reasonable code. If it's gotta be braces, I would much prefer \{} which reuses the existing escape char.

gavinking avatar Sep 05 '17 21:09 gavinking

I wouldn't like having an ambiguous syntax where \{} could mean both unicode and expression. So I definately prefer \().

xkr47 avatar Sep 06 '17 06:09 xkr47

@gavinking Out of curiosity, what problems does the the original syntax cause?

xkr47 avatar Sep 06 '17 06:09 xkr47

If we are trying to make our language more in-line with its siblings, then ${} is definitely the way to go, I'm afraid. Changing one unique syntax to an extremely uncommon one won't be seen as an improvement by anyone regardless of what we personally prefer. I also think the fight between () and ${} not worth alienating new users for.

While I agree with the general sentiment, I don't think in particular that using the \() syntax for string interpolation is going to alienate new users of the language (who have accepted shared, variable, value, formal, satisfies, etc). IMO, Ceylon offers a stronger and more consistent message/value of sensible (and usually, "innovative") choices that don't always necessarily align with the state of affairs in other languages. Yet, in this case, this syntax is also already used by Swift, a language that's used by and familiar to very many programmers.

I would much prefer \{} which reuses the existing escape char

As already mentioned, Swift (which is not unpopular) uses \(), so this syntax would not be terribly unfamiliar. \{} on the other hand is a mix of the two different styles \() and ${} and ends up being neither of these familiar styles.

lucono avatar Sep 06 '17 06:09 lucono

this syntax is also already used by Swift, a language that's used by and familiar to very many programmers

Sure, to every iOS programmer, but that's about it. Not very many of them do anything else but Swift, let alone Ceylon.

FroMage avatar Sep 06 '17 08:09 FroMage

@xkr47

I wouldn't like having an ambiguous syntax where \{} could mean both unicode and expression.

Ahyes, my bad, "\{#03A0}" would actually be completely ambiguous.

Forget \{}, that wouldn't work.

gavinking avatar Sep 06 '17 08:09 gavinking

FTR: I finally have a robust implementation of this, which was incredibly painful, frankly.

It's worth noting one thing about this. Whereas this is perfectly correct, using backticks:

"foo``bar("bar")``bar"

This is not accepted by the scanner:

"foo\(bar("bar"))bar"

Same for:

"foo${bar("bar")}bar"

You can't nest string literals inside the new escape syntax, because the first scanning phase results in the tokens "foo\(bar(", bar, "))bar".

We still have to decide between \() and ${}.

gavinking avatar Sep 07 '17 09:09 gavinking

I think "everyone" expects ${name} these days, in particular with the new Javascript string template literals becoming so commonly used.

IMO it would be a mistake not to pick the least surprising option. Surely \() is annoying for strings containing regexp's where you want to escape '(' too.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals

ghost avatar Sep 07 '17 15:09 ghost

@gavinking so do we have to use "foo\(bar(\"bar\"))bar", or can't we use string literals in there at all?

ePaul avatar Sep 07 '17 15:09 ePaul

@ePaul you can't have string literals inside interpolated expression at all.

gavinking avatar Sep 07 '17 16:09 gavinking

You can't nest string literals inside the new escape syntax.

I've seen constructs like this:

"foo``count == 1 then "" else "s"``"

Would those just have to keep using the old syntax?

CPColin avatar Sep 07 '17 17:09 CPColin

Would those just have to keep using the old syntax?

Yes.

gavinking avatar Sep 07 '17 17:09 gavinking

Well look, one principal reason why other languages have an additional escape character ($) for string interpolation is because they support stuff like "Hello $name!", and don't require the braces for single-token interpolation.

Is that something the you guys wanted to be able to write in Ceylon?

gavinking avatar Sep 07 '17 18:09 gavinking

@notsonotso

Surely \() is annoying for strings containing regexp's where you want to escape '(' too.

I don't see your point. It'd be as easy as today: regex("\\([0-9]{2,3}\\)").

Also, @gavinking, is it really not possible / too hard to support string literals inside interpolated expressions with \()/$()/${}?

ghost avatar Sep 07 '17 18:09 ghost

I don't see your point. It'd be as easy as today: regex("\\([0-9]{2,3}\\)").

That is not how we write regexes in Ceylon. We write: regex("""\([0-9]{2,3}\)""").

Also, @gavinking, is it really not possible / too hard to support string literals inside interpolated expressions with \()/$()/${}?

Using a regex-based scanner, it's absolutely impossible, AFAICT. I'm sure you could hack together something with a handwritten lexer.

gavinking avatar Sep 07 '17 19:09 gavinking

I really feel like familiarity doesn't matter as much as you guys are making it out to do. I think someone would have more trouble getting used to actual, formal, satisfies, etc. than to \() over ${}.

I feel like this choice should be made based on how the syntax harmonizes with the rest of Ceylon, and not based on other languages; just like how the choice was made for actual and friends.

Either way, in case anyone cares (I'm not sure if anyone does), here is a table of who prefers each syntax:

syntax people amount percentage
\() @Zambonifofex, @fwgreen, @arseniiv, @lucono, @jean-morissette, @luolong, @xkr47, @gavinking 8 57%
${} @chochos, @DiegoCoronel, @notsonotso, @bjansen, @FroMage, @jogro 6 43%

If anyone wants to be added to the table, just leave a comment here.

ghost avatar Sep 07 '17 19:09 ghost