pyupgrade
pyupgrade copied to clipboard
Rewriting within f-strings
This issue also affects django-upgrade: https://github.com/adamchainz/django-upgrade/issues/97
pyupgrade rewrites some names, so we might expect it to do so inside f-strings:
-f"{six.text_type(1)}"'
+f"{str(1)}"
Unfortunately it fails to do so. This is because there's no tokens for the contents of the f-string. Whilst the AST includes the f-strings, the tokenizer leaves the f-string as a single STRING. It's later tokenized inside ast.parse:
In [1]: import ast, tokenize_rt
In [2]: print(ast.dump(ast.parse('f"{x()}"'), indent=2))
Module(
body=[
Expr(
value=JoinedStr(
values=[
FormattedValue(
value=Call(
func=Name(id='x', ctx=Load()),
args=[],
keywords=[]),
conversion=-1)]))],
type_ignores=[])
In [3]: tokenize_rt.src_to_tokens('f"{x()}"')
Out[3]:
[Token(name='STRING', src='f"{x()}"', line=1, utf8_byte_offset=0),
Token(name='NEWLINE', src='', line=1, utf8_byte_offset=8),
Token(name='ENDMARKER', src='', line=2, utf8_byte_offset=0)]
Perhaps a fix would be to recursively invoke rewriting on f-string nodes.
the first hard blocker is the ast does not accurately indicate positions inside f-strings inside the current versions supported so it's difficult to even start approaching this problem (as such I haven't even attempted it)
an idea though, since we have a format string parser already, is to identify the f-strings during a tokenize pass and rewrite each of their format chunks separately? it may work (and potentially avoids the incorrect offsets)
an idea though, since we have a format string parser already, is to identify the f-strings during a tokenize pass and rewrite each of their format chunks separately? it may work (and potentially avoids the incorrect offsets)
Yeah that was what I poorly explained as "recursively invoke rewriting". I will look at trying it in django-upgrade first.
I believe https://peps.python.org/pep-0701/ Syntactic formalization of f-strings https://github.com/python/cpython/issues/102856 may facilitate this proposal by parsing fields separate from string parts. Not sure if will all make it into 3.12.
I believe https://peps.python.org/pep-0701/ Syntactic formalization of f-strings python/cpython#102856 may facilitate this proposal by parsing fields separate from string parts. Not sure if will all make it into 3.12.
Seems no one is working on the new tokenizer. The tokenizer in Python will still emit the old tokens. (sometimes buggy with new grammar)
I believe https://peps.python.org/pep-0701/ Syntactic formalization of f-strings python/cpython#102856 may facilitate this proposal by parsing fields separate from string parts. Not sure if will all make it into 3.12.
Seems no one is working on the new tokenizer. The tokenizer in Python will still emit the old tokens. (sometimes buggy with new grammar)
This is not correct, we are indeed working on changes to the python tokenizer and we plan to emit the new tokens there
Thanks Python 3.12!