starlark icon indicating copy to clipboard operation
starlark copied to clipboard

Formatted string literals (PEP 498)

Open arturdryomov opened this issue 5 years ago • 17 comments

PEP 498 introduced formatted string literals which make this possible:

>>> name = "Star-Lord"
>>> f"Starlark, say hi to {name}"
'Starlark, say hi to Star-Lord'

It would be nice to have it as an alternative to the regular string formatting, especially since modern languages like Kotlin and Swift make this formatting convenient and less error-prone.

val name = "Star-Lord"
val greeting = "Starlark, say hi to $name"
let name = "Star-Lord"
let greeting = "Starlark, say hi to \(name)"

However, I can completely understand that sometimes a language should be simple. So this is just an idea.

arturdryomov avatar Dec 04 '19 05:12 arturdryomov

Would be really nice to have f-strings in Starlark, f"{a}-{b}" is clearer and simpler to read than either "%s-%s" % (a, b) or "{}-{}".format(a,b), and avoids the standard gotchas of string interpolation (from the spec):

One subtlety: to use a tuple as the operand of a conversion in format string containing only a single conversion, you must wrap the tuple in a singleton tuple:

Using f-strings has become the default for us in all our Python 3 projects, so I'm assuming it'll become the standard way to do things in python once Python 2 hits EoL next month.

gibfahn avatar Dec 06 '19 11:12 gibfahn

Nice though this syntax may be, I never thought to myself while writing Python that it needs yet a third way of formatting strings. The implementation is decidedly non-trivial, affecting the scanner, parser, resolver, compiler, and runtime. Because f-strings count as variable references, the syntax tree would need to record both the literal f-string (for syntactic tools like formatters) and its parse tree (for semantic tools, like the compiler) in the syntax tree. The kludge by which x[y] appearing within a string is recursively parsed as x["y"] is quite subtle.

alandonovan avatar Jan 10 '20 14:01 alandonovan

+1 f-strings would be nice.

xycodex avatar Mar 29 '20 20:03 xycodex

It makes a big difference when you build a lot of strings; for example if you use starlark to build sh commands (it can significantly reduce code volume and help avoid trivial bugs).

files = sh(f"ls {dir}").stdout.split()
for i, file in enumerate(files):
    print(f"Checking file[{i}]: {file}")
    
    result = sh(f"stat -c %h -- '{file}'", warn=True)
    if result.exit_code > 1:
        print(f" {file!r} has {result.exit_code} references" )

vs

files = sh(f"ls {}".format(dir)).stdout.split()
for i, file in enumerate(files):
    print(f"Checking file[{}]: {}".format(i, file))
    
    result = sh(f"stat -c %h -- '{}'".format(file), warn=True)
    if result.exit_code > 1:
        print(f" {file!r} has {exit} references".format(file=file, exit=r.exit_code))

I do realise print has a special case for string formatting, but it doesn't help in the general case.

Alphasite avatar Aug 07 '20 17:08 Alphasite

Python doesn't need a third way of formatting strings. But I do think it needs one way where the expressions to be spliced in the string go at the point they get spliced, rather than at the end.

world = 'world'
print('Hello {}!'.format(world))
print('Hello %s!' % world)
print(f'Hello {world}!')

In only one variant does the world go before the !. The other two formatting strings could be removed as far as I care (but alas, I appreciate never will be).

The kludge by which x[y] appearing within a string is recursively parsed as x["y"] is quite subtle.

Can you explain what you mean by that? From the PEP I see:

Due to Python's string tokenizing rules, the f-string f'abc {a['x']} def' is invalid

So it seems like x[y] would get parsed as x[y]?

ndmitchell avatar Mar 12 '21 20:03 ndmitchell

relates also to https://github.com/bazelbuild/bazel/issues/685

ittaiz avatar Oct 01 '21 04:10 ittaiz

F-strings lend themselves very well to a language that is designed to be used for configuration as the templated string can be read very plainly. I think this is what @ndmitchell was getting at with his comparison of the three methods. F-strings are very far more declarative than printf style formatting, but serves a different use-case than .format.

My opinion would be that starlark should be focused which string formatting techniques can be used to make reading a starlark config file feel more like reading a config file and less like a script. A config file should provide less ambiguity in it's declarations.

f-strings vs:

Percent formatting: Percent formatting has some issues that are fully acknowledged by the core devs and community. They call it out at the top of the docs for printf style formatting https://peps.python.org/pep-0498/#differences-between-f-string-and-str-format-expressions. The evolution of interpolation via f-strings was a long-standing request from the community for a cleaner and more robust string formatting solution.

.format: F-strings interpolation can't be deferred. .format and percent formatting can both achieve this, and percent formatting is inferior to .format formatting, this results in a contest between .format and f-string. I would argue that f-strings and .format serve different use-cases due purely to this difference. Though .format has another advantage in its ability to pass things like values derived from dictionary keys in a friendlier way.

tl;dr; F-strings are far and away the easiest way to read a template string from the standpoint user reading a config file. .format allows for deferred evaluation of template strings.

LISTERINE avatar Oct 20 '22 20:10 LISTERINE

In only one variant does the world go before the !.

print('Hello ' + world + '!')

I know it's ugly, but in my opinion, this is still better than using the wrong ordering, especially when you have more than 2 variables to put in there. So unfortunately I will argue this is the best convention available for starlark at the moment.

TamaMcGlinn avatar Sep 06 '23 08:09 TamaMcGlinn

Starlark-rust now supports f-strings as an extension(off by default).

ndmitchell avatar Sep 06 '23 13:09 ndmitchell

Any chance this issue is being accepted / implemented 2024? Not having f-strings is a constant annoyance whenever I have to write Starlark code working with strings. F-strings are nowadays the standard Python approach for formatting strings and increase code readability considerably.

martis42 avatar Dec 30 '23 07:12 martis42

The emoji voting is very consistently enthusiastic, and Starlark-rust has an implementation, so I wouldn't object to a PR to Starlark-go to add it as an optional feature, so long as the Rust and Go implementations can align on the specification. I think it could still be a lot of work.

The folks that maintain the Java implementation should chime in too, as the demands on Blaze and Bazel may be different from other applications of Starlark. It's also possible that many other build-related tools or Starlark refactoring tools would need quite profound updates to accommodate this change, as it creates a way of referring to a symbol without an identifier.

@brandjon @tetromino

adonovan avatar Jan 02 '24 19:01 adonovan

Btw, Note that the original request came 8 years ago from bazel users (myself included)

ittaiz avatar Jan 03 '24 13:01 ittaiz

Suggested action items if we decide to move forward:

  • Discuss the wording for the spec. I assume we want a subset of the features found in Python (fstrings seem complicated).
  • Update Buildifier. Would needed even if fstrings are an optional feature.
  • Discuss the future of the % interpolation (in separate thread). It doesn't seem useful anymore; we could look into a migration plan.
  • Write PRs for the Java and Go interpreters.

For the people at Google: the Kythe indexer and the parser inside Cider will need to be updated.

(fyi for the context: I'm no longer working at Google)

laurentlb avatar Jan 03 '24 17:01 laurentlb

Does anyone know the latest on this? Is it only awaiting PRs for Java and Go, or is there a missing spec to be agreed upon?

albertocavalcante avatar Jul 29 '24 17:07 albertocavalcante

As far as I know, there was no update since my message. To clarify the scope, I think:

  • We can support f-strings so that f"...{x}..." is equivalent to "...{}...".format(x)
  • Features like =, ! conversion, : format_spec are not supported.
  • So inside brackets, we can only have an expression (Test rule in our grammar)

This should simplify things compared to the Python implementation. They still mention some corner cases (e.g. comments inside an f-string and reusing the same quote character), that we can forbid if it helps.

I cannot make the final call, but I think it would help to implement the feature in Java and Go.

laurentlb avatar Jul 29 '24 19:07 laurentlb

Before we implement it, we should enumerate all the cases currently supported by Python and declare whether we intend to support them in Starlark initially, later, or never. An implementation that supports only an ad hoc subset of the Python features may be hard to extend later. If we're going to support anything, we should probably support all the Python features that make sense for Starlark, which would include such things as iterables, e.g. f"{*[1, 2], *(3, 4)}" = "(1, 2, 3, 4)".

adonovan avatar Jul 31 '24 20:07 adonovan