black
black copied to clipboard
Crash on backlash followed by whitespace in a docstring
Describe the bug
Black crashes when a backslash escape is present in a docstring (which is valid python code).
To Reproduce
For example, take this code:
def foo():
"""
line ending with backslash \
"""
The resulting error is (link to the playground):
INTERNAL ERROR: Black produced code that is not equivalent to the source. Please report a bug on https://github.com/psf/black/issues. This diff might be helpful: /tmp/blk_zuzitj9b.log
--- src
+++ dst
@@ -18,11 +18,11 @@
value=
Constant(
kind=
None, # NoneType
value=
- 'line ending with backslash \\', # str
+ 'line ending with backslash', # str
) # /Constant
) # /Expr
decorator_list=
name=
'foo', # str
However, when we run black with the following code (one-line docstring)
def foo():
"""line ending with backslash \
"""
Black works as expected (link to the playground).
Expected behavior
Black should leave the docstring as-is.
Environment
- Black's version: [22.8.0]
- OS and Python version: [Windows 10/Python 3.8.11] -->
To be clear, the docstring line has to end with \
(a backslash followed by a space) for this bug to trigger
I was attempting to tackle this one for Hacktoberfest 2022. Turns out to be a tricky situation. I do have a potential fix, but the fix causes a flake8
failure on commit. So, my solution isn't a viable option.
For anyone else hoping to work on this one, here are some notes you might find helpful:
TL;DR
When running black
, it will strip all trailing whitespaces (which is generally the desired behavior). Then black
verifies that the original input source code (src
) is functionally equivalent to the generated output source code (dst
).
When a docstring contains a line ending with a backslash and a space (\
), stripping the trailing space character satisfies the rules of black
and flake8
, but fails the functional equivalency check.
Leaving the space character in for this one special case allows the functional equivalency check to pass, but that special case breaks the flake8
checks.
More Details
The root of the current failure is in the check_stability_and_equivalence
function of the __init__.py
file.
This calls assert_equivalent
which essentially uses the ast
module to parse both the original input source code (src
) and the generated output source code (dst
). The resulting objects are then converted back into strings (src_ast_str
and dst_ast_str
), and these strings are compared. If they don't match exactly, we raise an AssertionError
.
The Problem
When ast
parses the original source (src
), the docstring line ending with a backslash followed by a space (\
) is left unchanged. However, when ast
parses the output source (dst
), the docstring ending with just a backslash (\
with no space - since the trailing space was stripped by black
) gets modified, and the backslash is stripped. This information is lost.
So when converting the parsed original input source (src
) back into a string, we still have a docstring line that ends in a backslash and a space (\
), but converting the parsed output source (dst
) back into a string, we've lost the trailing backslash (\
) in the docstring line. At this point, it's irretrievable (we lost it in the ast parse).
When we compare the two strings, they won't be the same, so we assert.
Solutions
There are several potential solutions to this issue.
- Change
ast
to not strip trailing backslashes from lines in docstrings.
PROS
- fixes the problem at the source, requiring no changes to
black
- allows for all trailing whitespaces to still be stripped
CONS
- fundamentally changes how
ast
works- will have wide reaching implications, with a high probability of breaking existing systems that depend on
ast
- Change how we check for code equivalence (i.e. don't use
ast
parse).
PROS
- isolates changes to one function in
black
CONS
- complicates the verification step
- prone to error (it will be easy to make mistakes in verification)
- Change the rules; allow trailing whitespaces in docstring lines
PROS
- isolates changes to one function in
black
- gives developers more freedom in their docstrings
CONS
- breaks existing rules (
flake8
)- requires "special case" code
Conclusion
As far as I can see, there's no nice, clean solution here. Of the options outlined above, I personally prefer option 3. This is just my opinion, but it seems reasonable to allow developers to format their docstrings however they see fit.
I see docstrings as a way for the developer to convey information to other developers working in that code. Developers should be allowed to use that space however they see fit in order to convey information in the clearest possible manner; even if it means allowing them to have trailing spaces.
If you are interested, I have a simple implementation of option 3 available (with unit tests). However, I can also understand how that solution might not be desired. Feel free to comment. All feedback is welcome.
Thank you for the write-up. As you've worked with it already, can you estimate if it would be feasible to only allow whitespace after backslashes?
Unfortunately, as far as I can tell, any trailing whitespace will cause a failure in the flake8
checks.
I'm not very familiar with flake8
, so it might be worth investigating whether or not there is some way to disable just the one check. The Selecting and Ignoring Violations section of the documentation might help.
EDIT: It's been a while since I looked at this, but my initial "fix" focused on allowing a trailing white space in docstrings (but that causes the issues with flake8
iirc).
--- a/src/black/strings.py
+++ b/src/black/strings.py
@@ -78,8 +78,8 @@ def fix_docstring(docstring: str, prefix: str) -> str:
for i, line in enumerate(lines[1:]):
stripped_line = line[indent:].rstrip()
+ if line[-2:] == "\\ ":
+ stripped_line += " "
if stripped_line or i == last_line_idx:
trimmed.append(prefix + stripped_line)
Yes definitely, single checks can be excluded with their label 👌 I'm more interested in making Black not crash, because then users can either ignore the check or fix their docstring.