rewrite-python icon indicating copy to clipboard operation
rewrite-python copied to clipboard

Correct parsing for `f-string` string literal concatenation

Open knutwannheden opened this issue 1 year ago • 0 comments

Python has a feature called string literal concatenation and allows splitting a a literal into two (typically on separate lines) and then have them joined into a single string literal at compile time, which is what differentiates them for string literals concatenated using +. Currently, the parser parses string literal concatenation into a single J.Literal and makes sure that the concatenation is then reflected by the valueSource property. F-strings, on the other hand, are parsed into a Py.FormattedString node, where the parts are stored as Expressions in the parts property and the start delimiter (e.g. f") in the delimiter property).

Now, when string literal concatenation is combined with f-strings, this breaks down. So instead the parser should try to produce Py.Binary nodes with the operator set to Py.Binary.Type.StringConcatenation and the literals to the left and right properties. The difficulty here is that the Python AST has already abstracted and the individual literals have already been merged into one ast.Constant or ast.JoinedStr so that the visitor needs to use the tokenize() function to extract the information from there at the correct offset. This in turn can also be problematic because the tokenizer barks if the INDENT and DEDENT tokens don't match up when the tokenization is started mid-stream.

knutwannheden avatar Oct 19 '24 15:10 knutwannheden