Generalized string interpolation
Currently string interpolation can only create strings. It's a powerful template mechanism, but it's restricted to creating a string at the end.
If we instead allowed the string parts and values to be collected into a general interface, instead of just something like a StringBuffer, then other kinds of "literals" could use the feature.
Let's say we defined:
abstract class Interpolator<R, T> {
Interpolator<R, T> addString(String string);
Interpolator<R, T> addValue(T value);
R toValue();
}
and then allowed the syntax <postfixExpression> <stringLiteral> to be used to provide an Interpolator which is called with the parts, returning a new (or the same) interpolator with the updated state, instead of just turning it all into a string.
Example:
class JsonInterpolator implements Interpolator<String, Object?> {
final StringBuffer _buffer = StringBuffer();
addString(string) {
_buffer.add(string);
return this;
}
addValue(value) {
_buffer.add(jsonEncode(value));
return this;
}
String toValue() => _buffer.toString();
}
JsonInterpolator get jsn => JsonInterpolator();
This class would allow you to write:
var myJson = jsn"""
{ $name: $value,
"other": [$v1, $v2],
"all": [
${for (var i = 0; i < values; i++) ...[if (i > 0) ",", values[i]] /* using #1478 */}
]
}
""";
and have all the values which are plugged into the string be JSON encoded first.
An expression of the form e stringLiteral would be a compile-time error if e was not assignable to Interpolator<X, Y> for some X and Y. It is a compile-time error if the elements of the interpolation are not assignable to Y. The static type of the expression is X.
The default interpolation is just an implicit Interpolator<String, Object?> which does .toString() on all values and concatenates the strings.
Grammar-wise this conflicts with r"string" if r refers to an Interpolator. Maybe we need to put a symbol between the two, but all the good symbols are taken. Maybe we could introduce a new syntax instead: <e>"string", it just looks a little too much like a type. We could make it a suffix instead, so "{$x:$y}"jsn, looking more like a RegExp flag. I think it's better for readability to be in front.
It would change auto-concatenation of string literals, because that only works for actual strings. A string literal with a non-default interpolator is not concatenated with any preceding string literals. It may apply to all of the following adjacent string literals.
It might even be possible to extend this to map and collection literals:
collectionBuilder {e1, e2, e3 }
mapBuilder {k1: v1, k2: v2, k3: v3}
with APIs like abstract class CollectionBuilder<T> { void add(T element); } and abstract class MapBuilder<K, V> {void operator[](K key, V value); }. (It's not absolutely clear whether they need a toValue method as well, which would prevent using the current Set/List/Map APIs.
Comparison to extensions
At first glance, there is not much difference in syntax compared to extensions:
final interpolated = json"{$name: $value}";
final withExtension = "{$name: $value}".json;
The real benefit is that with the Interpolator API it would be possible to map the interpolated values automatically with the correct encoding. Each value might get urlEncoded or jsonEncoded which is a manual step with the current string interpolation.
The extension example with correct encoding would therefore simplify.
final interpolated = json"{$name: $value}";
final withExtension = "{$name: ${jsonEncode(value)}}".json;
All tokens at once
The proposed API splits parsing strings and interpolated values in two methods which are called in order the "tokens" appear in the interpolated string.
abstract class Interpolator<R, T> {
Interpolator<R, T> addString(String string);
Interpolator<R, T> addValue(T value);
R toValue();
}
Maybe it would be easier for the implementer to get the full list of "tokens" at once. This would allow to look ahead to decide on the correct encoding for each value. That's also possible with the proposed API by building the list and parsing it when toValue() is called. But the list probably already exists. Why not exposing it.
abstract class Interpolator<R, T> {
R toValue(List<Token<T>> tokens);
}
abstract class Token<T> {
bool get isString;
bool get isValue;
String get string;
T get value;
}
I thought about an API like the Token thing, but decided against it.
It's an extra overhead. If you want it, you can define your own Token class and collect the values internally, so it doesn't provide more power, only less flexibility and more allocations. I'd like to not have more allocations than necessary.
If you want to throw in one of the add calls, then you can do so immediately, and not wait for the rest of the expressions to be evaluated.
It's a "push" API, so the caller knows whether it's a string or a value, the compiler can literally turn
var myJson = jsn"{ $name: $value }";
into
var myJson = jsn.addString("{ ").addValue(name).addString(": ").addValue(value).addString("}").toValue();
(There is an issue with my approach, though. The spread ...[if (i > 0) ",", something, ": "...] seems to assume that the strings in there are emitted as strings, but they will be values, so it's not possible to programmatically insert strings into the result without going through addValue. That is annoying.
Maybe we should allow an interpolation element, because interpolations should be elements anyway, of ..."something${foo}other" (with a leading ... like a spread, but it's a string expression, not an iterable) to add directly to the surrounding collector.)
I believe the rest parts of interpolated strings outside of $variable and ${expression} also be converted by addString().
Then, class g implements Interpolator<Characters, Object?> can be used to make grapheme cluster literals. #1432
But, the results can't be constants.
Is it correct?
You can creater a gc"string" prefix which is just the normal behavior for string interpolations, except that toValue returns the .characters of the string.
class _GraphemeClusterInterpolator implements Interpolator<Characters, Object?> {
final StringBuffer buffer = StringBuffer();
addString(String value) {
buffer.write(value);
return this;
}
addValue(Object? value) {
buffer.write(value.toString());
return this;
}
toValue() => buffer.toString().characters;
}
Interpolator<Characters, Object?> get gc => _GraphemeClusterInterpolator();
You can also do things like xml "<html> ... </html>" which parses the string and returns a non-string value (like the JSON example). You can use a progressive/chunked parser and do things incrementally, without building the entire source first (and not need to convert $value to a string first, and then parse it back later, if the value must be a valid XMLNode).
It does mean the the result can't be const, not unless we increase the capability of constant computation significantly.
Possible use case DateFormat
While it isn't shorter, it could be a typesafe way to construct a DateFormat.
final DateFormat dateFormat = DateFormat("h:mm a");
final DateFormat dateFormatInterpolated =
df"${DfSymbol.hourInAmPm}:${DfSymbol.minuteInHour} ${DfSymbol.amPmMarker}";
class DateFormatInterpolator implements Interpolator<DateFormat, DfSymbol> { }
enum DfSymbol {
// h
hourInAmPm,
// mm
minuteInHour,
// a
amPmMarker
}
Multiple interpolated types
The question I was asking myself whether it would be possible to also inject normal Strings via interpolation in such a date format interpolated string.
final String username = account.userName;
final DateFormat dateFormatInterpolated =
df"It's ${DfSymbol.hourInAmPm}:${DfSymbol.minuteInHour} ${DfSymbol.amPmMarker} for $username";
To make it work with the current API we need sum types #83. Then one could write
class DateFormatInterpolator implements Interpolator<DateFormat, DfSymbol|String> { }
@lrhn
You can creater a
gc"string"prefix which is just the normal behavior for string interpolations, except thattoValuereturns the.charactersof the string.
It does mean the the result can't be const, not unless we increase the capability of constant computation significantly.
It sounds better, but not best, for me.
I agree that Dart doesn't need JSON-string-literals. A more realistic example would be XML-string-literals, where the values can be XML Nodes, or perhaps strings which are then properly escaped. Or SQL literals where again the values are escaped for you.
Or some kind of template system (like, Dart code generation). It's not perfect for that because it doesn't nest well. If I want to inline a list, I cant just do ${for (var x in list) ...[x, ", "]} because the comma string will be treated as a value, not a string.
Maybe I just need a more comprehensive syntax, Scheme's quote/unquote π .
I've been wanting named string templates for a long time. I think I have an internal doc from 2011 proposing it. :)
With the static metaprogramming stuff, @jakemac53 and I are considering using them to also provide a nicer syntax for constructing pieces of Dart syntax, like:
var e = expr"{}";
var s = stmt"{}";
Using bare strings has some problems because, as in the examples above, the language is ambiguous if you don't know what grammar production you are trying to parse. A {} is a block in a statement context and an empty map in an expression context. Using named string templates with different templates (here, expr and stmt) would provide the API enough context to know how to parse the string.
Personally, I'm not crazy about the push API you defined. I think it will give templates more flexibility to have a pull API. In particular, I'd rather the interpolated expressions be thunks so that the template handler can choose when/if to evaluate them, handle exceptions coming from them, etc.
Of course, if you start talking about wanting to give user code the ability to not evaluate some subexpressions, that starts to look a lot like a macro... So Jake and I have discussed a little about whether some kind of named string template thing should be a compile-time API that gets expanded using static metaprogramming. We don't have anything at all coherent for that yet, though.
But, overall, yes, I would love named string templates like this.
@lrhn can we use encoding constants as prefix:
const List<int> hello = utf8'Hello, δΈη';
String literals are already utf-8 without prefix. And, I think String literals with prefix in this proposal can't be constants.
By the way, there is no definition of the encoding of source code in the spec. What should define the encoding of source code?
String values are encoded as UTF-16 (or rather, they are sequences of UTF-16 code units, not necessarily valid UTF-16), not UTF-8. The proposed idea here should be able to create a Uint8List from utf8'some text'. It will do so at run-time, from the string values.
I'd prefer if it was possible to create the UTF-8 bytes at compile-time instead, but that's probably a job for macros (@jakemac53 - expression macros which expand to something else, yay or nay?)
Dart source text is represented as a sequence of Unicode code points.
That's all the spec says, but in practice the compiler only accepts UTF-8. (Just tried with UTF-16 LE/BE with BOM, and no success, it must be UTF-8).
My idea is to use other encodings as well: json, ascii, ... , myCustomCodec.
I'd prefer if it was possible to create the UTF-8 bytes at compile-time instead, but that's probably a job for macros (@jakemac53 - expression macros which expand to something else, yay or nay?)
Yes I think expression level macros would be well suited for this (but we haven't attempted to specify them yet)
There is small discussion of something similar here. A strawman could be:
@b static const _HTTP = 'HTTP'; // generates: static const HTTP = [72, 84, 84, 80];
Sorry, I've misunderstood. Yes, source code is utf-8 but 'Hello, δΈη' is compiled to utf-16.
I'm not sure what specify the character encoding of source code, though.
@Levi-Lesches dart has external keyword for internal implementation:
@b('HTTP')
external List<int> get HTTP;
// generates:
const List<int> HTTP = <int>[72, 84, 84, 80];
FYI, C# supports utf-8 string literal. What's new in C# 11 - C# Guide | Microsoft Learn
Another option for parametrization is to allow controlling escapes.
Consider adding a member
int addEscape(Iterable<int> charCodes);
which get called when seeing a \.
(You can't intercept the escape of the current quote, like \" for a " or """ string, because the parser needs to know where the string literal ends.)
Maybe the tag can implement one of RawTag for no escapes (or interpolations, but what's the point then?), or CustomEscapeTag for intercepting escapes, so the parser knows how to treat the coming
Maybe it can also choose whether to apply to only one string literal, or all following string literals, to allow adjacent string literals to combine. But maybe that should be the default. (Otherwise a tag "string" "string" where the first tag "string"would evaluate to a valid tag, would be ambiguous.)
An easier approach would be to just recognize normal escapes or not, but allow "invalid escapes" and keep them in the strings passed to the tag, rather than remove them. Then the tag processor can interpret them as it wants.
This format (aesthetically unattractive as it is), also has an unfortunate property: it makes it difficult to control whitespace and indentation. Here's how the program will look in real life:
class A {
String generateJson() {
//...
return jsn"""
{ $name: $value,
"other": [$v1, $v2],
"all": [
${for (var i = 0; i < values; i++) ...[if (i > 0) ",", values[i]]}
]
}""";
}
}
Any attempt to fix the formatting will lead to unwanted whitespace in the generated json.
In other words, either the source formatting will be off , or the output formatting will be off, or both.
None of these problems occur in ~ variant discussed in a competing thread.
It should give you full control over indentation and whitespace. It might not be convenient to use that control, but whitespace inside interpolations is ignored, and whitespace outside is not.
return jsn"""
{ $name: $value,
"other": [$v1, $v2],
"all": [${ for (var i = 0; i < values.length; i++)
...'\n ${values[i]}${if (i > 0) ...','}'
}]
}
""";
(where ... stringExpression emits that content into the outer template)
The outside whitespace is a problem. If you want to fix the source formatting in my previous example, you have to write
class A {
String generateJson() {
//...
return jsn"""
{ $name: $value,
"other": [$v1, $v2],
"all": [
${for (var i = 0; i < values; i++) ...[if (i > 0) ",", values[i]]}
]
}""";
}
}
But then, you have extra spaces at the beginning of each output line.
String literals of the form """...""" do not play well with source formatting. That's why in some languages, they allow a symbol like | to mark the actual starting position
"""
|first line
|second line
""";
which removes all whitespace before | and produces the output with no leading spaces:
first line
second line
But even then, it's not immediately clear how to produce the output where each value in the list is placed on a separate line, with a correct offset, like
"all": [
1,
2,
3
]