black icon indicating copy to clipboard operation
black copied to clipboard

Crash on backlash followed by whitespace in a docstring

Open LouisJustinTALLOT opened this issue 2 years ago • 5 comments

Describe the bug

Black crashes when a backslash escape is present in a docstring (which is valid python code).

To Reproduce

For example, take this code:

def foo():
    """
    line ending with backslash \ 
    """

The resulting error is (link to the playground):

INTERNAL ERROR: Black produced code that is not equivalent to the source.  Please report a bug on https://github.com/psf/black/issues.  This diff might be helpful: /tmp/blk_zuzitj9b.log

--- src
+++ dst
@@ -18,11 +18,11 @@
           value=
             Constant(
               kind=
                 None,  # NoneType
               value=
-                'line ending with backslash \\',  # str
+                'line ending with backslash',  # str
             )  # /Constant
         )  # /Expr
       decorator_list=
       name=
         'foo',  # str

However, when we run black with the following code (one-line docstring)

def foo():
    """line ending with backslash \ 
    """

Black works as expected (link to the playground).

Expected behavior

Black should leave the docstring as-is.

Environment

  • Black's version: [22.8.0]
  • OS and Python version: [Windows 10/Python 3.8.11] -->

LouisJustinTALLOT avatar Sep 07 '22 19:09 LouisJustinTALLOT

To be clear, the docstring line has to end with \ (a backslash followed by a space) for this bug to trigger

zsol avatar Sep 08 '22 09:09 zsol

I was attempting to tackle this one for Hacktoberfest 2022. Turns out to be a tricky situation. I do have a potential fix, but the fix causes a flake8 failure on commit. So, my solution isn't a viable option.

For anyone else hoping to work on this one, here are some notes you might find helpful:

TL;DR

When running black, it will strip all trailing whitespaces (which is generally the desired behavior). Then black verifies that the original input source code (src) is functionally equivalent to the generated output source code (dst).

When a docstring contains a line ending with a backslash and a space (\ ), stripping the trailing space character satisfies the rules of black and flake8, but fails the functional equivalency check.

Leaving the space character in for this one special case allows the functional equivalency check to pass, but that special case breaks the flake8 checks.

More Details

The root of the current failure is in the check_stability_and_equivalence function of the __init__.py file.

This calls assert_equivalent which essentially uses the ast module to parse both the original input source code (src) and the generated output source code (dst). The resulting objects are then converted back into strings (src_ast_str and dst_ast_str), and these strings are compared. If they don't match exactly, we raise an AssertionError.

The Problem

When ast parses the original source (src), the docstring line ending with a backslash followed by a space (\ ) is left unchanged. However, when ast parses the output source (dst), the docstring ending with just a backslash (\ with no space - since the trailing space was stripped by black) gets modified, and the backslash is stripped. This information is lost.

So when converting the parsed original input source (src) back into a string, we still have a docstring line that ends in a backslash and a space (\ ), but converting the parsed output source (dst) back into a string, we've lost the trailing backslash (\) in the docstring line. At this point, it's irretrievable (we lost it in the ast parse).

When we compare the two strings, they won't be the same, so we assert.

Solutions

There are several potential solutions to this issue.

  1. Change ast to not strip trailing backslashes from lines in docstrings.

PROS

  • fixes the problem at the source, requiring no changes to black
  • allows for all trailing whitespaces to still be stripped

CONS

  • fundamentally changes how ast works
  • will have wide reaching implications, with a high probability of breaking existing systems that depend on ast
  1. Change how we check for code equivalence (i.e. don't use ast parse).

PROS

  • isolates changes to one function in black

CONS

  • complicates the verification step
  • prone to error (it will be easy to make mistakes in verification)
  1. Change the rules; allow trailing whitespaces in docstring lines

PROS

  • isolates changes to one function in black
  • gives developers more freedom in their docstrings

CONS

  • breaks existing rules (flake8)
  • requires "special case" code

Conclusion

As far as I can see, there's no nice, clean solution here. Of the options outlined above, I personally prefer option 3. This is just my opinion, but it seems reasonable to allow developers to format their docstrings however they see fit.

I see docstrings as a way for the developer to convey information to other developers working in that code. Developers should be allowed to use that space however they see fit in order to convey information in the clearest possible manner; even if it means allowing them to have trailing spaces.

If you are interested, I have a simple implementation of option 3 available (with unit tests). However, I can also understand how that solution might not be desired. Feel free to comment. All feedback is welcome.

e-b-olson avatar Oct 30 '22 19:10 e-b-olson

Thank you for the write-up. As you've worked with it already, can you estimate if it would be feasible to only allow whitespace after backslashes?

felix-hilden avatar Jan 12 '23 20:01 felix-hilden

Unfortunately, as far as I can tell, any trailing whitespace will cause a failure in the flake8 checks.

I'm not very familiar with flake8, so it might be worth investigating whether or not there is some way to disable just the one check. The Selecting and Ignoring Violations section of the documentation might help.

EDIT: It's been a while since I looked at this, but my initial "fix" focused on allowing a trailing white space in docstrings (but that causes the issues with flake8 iirc).

--- a/src/black/strings.py
+++ b/src/black/strings.py
@@ -78,8 +78,8 @@ def fix_docstring(docstring: str, prefix: str) -> str:
         for i, line in enumerate(lines[1:]):
             stripped_line = line[indent:].rstrip()
 
+            if line[-2:] == "\\ ":
+                stripped_line += " "
 
             if stripped_line or i == last_line_idx:
                 trimmed.append(prefix + stripped_line)

e-b-olson avatar Jan 13 '23 06:01 e-b-olson

Yes definitely, single checks can be excluded with their label 👌 I'm more interested in making Black not crash, because then users can either ignore the check or fix their docstring.

felix-hilden avatar Jan 14 '23 09:01 felix-hilden