black
black copied to clipboard
Black fails to tokenise files ending with a backslash
Given a file containing a backslash preceeded and followed by any number of newlines, Black ae5588 and 19.3b0 throw blib2to3.pgen2.tokenize.TokenError: 'EOF in multi-line statement', (2, 0).
I consider this a bug because Python is perfectly happy to execute such files, doing nothing, and compile("\\", "<string>", "exec") also works:
>>> code = compile("\\", "<string>", "exec") # or "\\\n", or "\n\\\n", etc.
>>> import dis; dis.dis(code)
1 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
Like #970, I found this with Hypothesmith.
This is still present in Black 19.10b0 - it's a different bug to #922/#948; Python ignores a trailing backslash but Black chokes on it.
It looks like Python's built-in compile behaviour became stricter between py37 and py38; a trailing line continuation statement is no longer accepted.
Python 3.7.7 (default, Apr 1 2020, 13:48:52)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> compile('\\', '<STRING>', 'exec')
<code object <module> at 0x7f60bd565270, file "<STRING>", line 1>
>>>
Python 3.8.7 (default, Dec 22 2020, 10:37:26)
[GCC 10.2.1 20201207] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> compile('\\', '<STRING>', 'exec')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<STRING>", line 1
\
^
SyntaxError: unexpected EOF while parsing
>>>
This could still be addressed; there's some work-in-progress included in #1961. Does anyone have suggestions on how best to proceed?
"Ignore this until py37 reaches end of life" seems like a reasonable plan to me, and it's easy enough to adjust the tests accordingly.
Another example where this has surfaced during fuzzer testing, after merging #1991:
https://github.com/psf/black/pull/1958/checks?check_run_id=1945936278
Falsifying example: test_idempotent_any_syntatically_valid_python(
src_contents='\n\x0c\\\r\n',
mode=Mode(target_versions=set(), line_length=88, string_normalization=False, magic_trailing_comma=True, experimental_string_processing=False, is_pyi=False),
)
It might be possible to adjust the special case regular expression in the exception handler to permit this too. Perhaps we should also be a bit wary of getting into an attempt to detect a universe of valid-ish programs via a regex, though.
Aw, heck. Form-feed (\x0c) is always tricky... see e.g. https://github.com/Instagram/LibCST/issues/446.
I think we should just check "\\" in src_contents instead of using regex 😅
I think we should just check
"\\" in src_contentsinstead of using regex
That's possible.. it seems like that might be quite permissive, though. That said, I suppose the EOF-in-multiline exception should be quite rare and selective.
reaches
Just digging back through some old issue threads.. Py3.7 is EOL nowadays, so perhaps this issue can be closed? (backslash at end-of-file causes a black parser error -- and since Py3.8, the Python parser considers that invalid too)
I like it when the universe fixes the bug for you.