black icon indicating copy to clipboard operation
black copied to clipboard

Black fails to tokenise files ending with a backslash

Open Zac-HD opened this issue 6 years ago • 6 comments
trafficstars

Given a file containing a backslash preceeded and followed by any number of newlines, Black ae5588 and 19.3b0 throw blib2to3.pgen2.tokenize.TokenError: 'EOF in multi-line statement', (2, 0).

I consider this a bug because Python is perfectly happy to execute such files, doing nothing, and compile("\\", "<string>", "exec") also works:

>>> code = compile("\\", "<string>", "exec")  # or "\\\n", or "\n\\\n", etc.
>>> import dis; dis.dis(code)
  1           0 LOAD_CONST               0 (None)
              2 RETURN_VALUE

Like #970, I found this with Hypothesmith.

Zac-HD avatar Sep 10 '19 04:09 Zac-HD

This is still present in Black 19.10b0 - it's a different bug to #922/#948; Python ignores a trailing backslash but Black chokes on it.

Zac-HD avatar Oct 29 '19 23:10 Zac-HD

It looks like Python's built-in compile behaviour became stricter between py37 and py38; a trailing line continuation statement is no longer accepted.

Python 3.7.7 (default, Apr  1 2020, 13:48:52) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> compile('\\', '<STRING>', 'exec')
<code object <module> at 0x7f60bd565270, file "<STRING>", line 1>
>>>
Python 3.8.7 (default, Dec 22 2020, 10:37:26) 
[GCC 10.2.1 20201207] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> compile('\\', '<STRING>', 'exec')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<STRING>", line 1
    \
    ^
SyntaxError: unexpected EOF while parsing
>>> 

This could still be addressed; there's some work-in-progress included in #1961. Does anyone have suggestions on how best to proceed?

jayaddison avatar Feb 15 '21 18:02 jayaddison

"Ignore this until py37 reaches end of life" seems like a reasonable plan to me, and it's easy enough to adjust the tests accordingly.

Zac-HD avatar Feb 15 '21 21:02 Zac-HD

Another example where this has surfaced during fuzzer testing, after merging #1991:

https://github.com/psf/black/pull/1958/checks?check_run_id=1945936278

Falsifying example: test_idempotent_any_syntatically_valid_python(
    src_contents='\n\x0c\\\r\n',
    mode=Mode(target_versions=set(), line_length=88, string_normalization=False, magic_trailing_comma=True, experimental_string_processing=False, is_pyi=False),
)

It might be possible to adjust the special case regular expression in the exception handler to permit this too. Perhaps we should also be a bit wary of getting into an attempt to detect a universe of valid-ish programs via a regex, though.

jayaddison avatar Feb 21 '21 12:02 jayaddison

Aw, heck. Form-feed (\x0c) is always tricky... see e.g. https://github.com/Instagram/LibCST/issues/446.

I think we should just check "\\" in src_contents instead of using regex 😅

Zac-HD avatar Feb 21 '21 23:02 Zac-HD

I think we should just check "\\" in src_contents instead of using regex

That's possible.. it seems like that might be quite permissive, though. That said, I suppose the EOF-in-multiline exception should be quite rare and selective.

jayaddison avatar Feb 22 '21 10:02 jayaddison

reaches

Just digging back through some old issue threads.. Py3.7 is EOL nowadays, so perhaps this issue can be closed? (backslash at end-of-file causes a black parser error -- and since Py3.8, the Python parser considers that invalid too)

jayaddison avatar Aug 25 '23 10:08 jayaddison

I like it when the universe fixes the bug for you.

JelleZijlstra avatar Aug 25 '23 16:08 JelleZijlstra