pyupgrade Rewriting within f-strings

This issue also affects django-upgrade: https://github.com/adamchainz/django-upgrade/issues/97

pyupgrade rewrites some names, so we might expect it to do so inside f-strings:

-f"{six.text_type(1)}"'
+f"{str(1)}"

Unfortunately it fails to do so. This is because there's no tokens for the contents of the f-string. Whilst the AST includes the f-strings, the tokenizer leaves the f-string as a single STRING. It's later tokenized inside ast.parse:

In [1]: import ast, tokenize_rt

In [2]: print(ast.dump(ast.parse('f"{x()}"'), indent=2))
Module(
  body=[
    Expr(
      value=JoinedStr(
        values=[
          FormattedValue(
            value=Call(
              func=Name(id='x', ctx=Load()),
              args=[],
              keywords=[]),
            conversion=-1)]))],
  type_ignores=[])

In [3]: tokenize_rt.src_to_tokens('f"{x()}"')
Out[3]:
[Token(name='STRING', src='f"{x()}"', line=1, utf8_byte_offset=0),
 Token(name='NEWLINE', src='', line=1, utf8_byte_offset=8),
 Token(name='ENDMARKER', src='', line=2, utf8_byte_offset=0)]

Perhaps a fix would be to recursively invoke rewriting on f-string nodes.

Nov 25 '21 11:11 adamchainz

the first hard blocker is the ast does not accurately indicate positions inside f-strings inside the current versions supported so it's difficult to even start approaching this problem (as such I haven't even attempted it)

an idea though, since we have a format string parser already, is to identify the f-strings during a tokenize pass and rewrite each of their format chunks separately? it may work (and potentially avoids the incorrect offsets)

Nov 25 '21 13:11 asottile

an idea though, since we have a format string parser already, is to identify the f-strings during a tokenize pass and rewrite each of their format chunks separately? it may work (and potentially avoids the incorrect offsets)

Yeah that was what I poorly explained as "recursively invoke rewriting". I will look at trying it in django-upgrade first.

Nov 25 '21 14:11 adamchainz

I believe https://peps.python.org/pep-0701/ Syntactic formalization of f-strings https://github.com/python/cpython/issues/102856 may facilitate this proposal by parsing fields separate from string parts. Not sure if will all make it into 3.12.

Apr 21 '23 07:04 terryjreedy

I believe https://peps.python.org/pep-0701/ Syntactic formalization of f-strings python/cpython#102856 may facilitate this proposal by parsing fields separate from string parts. Not sure if will all make it into 3.12.

Seems no one is working on the new tokenizer. The tokenizer in Python will still emit the old tokens. (sometimes buggy with new grammar)

Apr 21 '23 20:04 sunmy2019

I believe https://peps.python.org/pep-0701/ Syntactic formalization of f-strings python/cpython#102856 may facilitate this proposal by parsing fields separate from string parts. Not sure if will all make it into 3.12.

Seems no one is working on the new tokenizer. The tokenizer in Python will still emit the old tokens. (sometimes buggy with new grammar)

This is not correct, we are indeed working on changes to the python tokenizer and we plan to emit the new tokens there

Apr 24 '23 00:04 pablogsal

Thanks Python 3.12!

Jun 12 '23 07:06 adamchainz

pyupgrade pyupgrade copied to clipboard

Rewriting within f-strings

pyupgrade
pyupgrade copied to clipboard