black icon indicating copy to clipboard operation
black copied to clipboard

Use optional parentheses more often

Open cdce8p opened this issue 4 years ago • 33 comments

Describe the style change

For short if statements (just one and/or) black tries to get rid of optional parentheses whenever possible. One common result of that behavior is that the line is instead spilt on a function call which often reduces the readability compared to optional parentheses. Even if the optional parentheses are inserted manually, black will reformat the code.

Examples in the current Black style with desired style

Some examples taken from the pylint code base.

if not is_node_in_type_annotation_context(node) and isinstance(
    node.parent, astroid.Subscript
):
    pass

# better
if (
    not is_node_in_type_annotation_context(node)
    and isinstance(node.parent, astroid.Subscript)
):
    pass
def _supports_mapping_protocol(value: astroid.node_classes.NodeNG) -> bool:
    return _supports_protocol_method(
        value, GETITEM_METHOD
    ) and _supports_protocol_method(value, KEYS_METHOD)

# better
def _supports_mapping_protocol(value: astroid.node_classes.NodeNG) -> bool:
    return (
        _supports_protocol_method(value, GETITEM_METHOD)
        and _supports_protocol_method(value, KEYS_METHOD)
    )
                    if getattr(
                        func, "name", None
                    ) == "iter" and utils.is_builtin_object(func):

# better
                    if (
                        getattr(func, "name", None) == "iter"
                        and utils.is_builtin_object(func)
                    ):

Additional context For any statement with more at least two priority delimiters the above suggested syntax is already the default.

return (
    is_postponed_evaluation_enabled(node)
    and value.qname() in SUBSCRIPTABLE_CLASSES_PEP585
    and is_node_in_type_annotation_context(node)
)

One possible implementation could be the modify these lines here and use > 0 instead of > 1. https://github.com/psf/black/blob/76e1602d60391cca1bb3b21a9ef6ed07a1bca17d/src/black/init.py#L6751-L6754

cdce8p avatar Apr 27 '21 19:04 cdce8p

What is HTML ?

friendyou1 avatar Apr 28 '21 01:04 friendyou1

@friendyou1

From a quick Google Search the summary from Wikipedia says the following:

The HyperText Markup Language, or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets and scripting languages such as JavaScript.

i.e. it's the language that defines webpages (to be precise their content and structure), this page you're reading is made up of HTML. HTML is often used with JS (for interactivity) and CSS (to style and design the page so it looks nice)

Finally, please avoid posting irrelevant comments, it's noisy on a busy issue tracker. Thank you!

ichard26 avatar Apr 28 '21 01:04 ichard26

If someone is interested, I've opened #2163 which would address it.

cdce8p avatar Apr 28 '21 22:04 cdce8p

The linked PR had positive feedback from maintainers but was rejected because of the disruption it would cause. So should this issue be closed by extension, or is there still a discussion to be had about this suggestion?

Marc did ask if keeping user-inserted parentheses would be an option. In my opinion the style would indeed be an improvement. But is it inappropriate with respect to Black's philosophy or reasonable pragmatism?

felix-hilden avatar May 11 '21 13:05 felix-hilden

I've created a followup PR #2237 that would keep user inserted optional parentheses

cdce8p avatar May 16 '21 18:05 cdce8p

Based on my recent ride around the issue tracker, it is evident that this is a highly-requested feature. I'll summarise Łukasz' remarks when closing the initial PR:

  • the change multiplies the number of lines sources would have after formatting
  • it doesn't match the way people write code (naturally splitting lines where you are rather than going back)

Łukasz ended with saying he's also disappointed at the fact that Black is not internally consistent in this regard.

If I may, I'll try to push this closer to a resolution once more.

Keeping user-inserted parentheses

I agree that this is a suboptimal solution, since it is akin to the magic trailing comma and comes with all the similar problems.

Matching the way people write code

I think this is not that important given Black's function. We ruthlessly splits lines, change quotations and add or remove other parentheses. Users "cede control over formatting" to Black, they shouldn't expect Black to improve their current desired (lack of) style.

Disrupting existing code

This is the biggie. If we can agree on a good set of rules to actually do this, it could be tested on the primer projects to see exactly how much disruption it would cause. But getting too mathematical is not the point either.

Is the proposed style better?

There seems to be no question that this is the better style in the examples that were given:

if not is_node_in_type_annotation_context(node) and isinstance(
    node.parent, astroid.Subscript
):
    pass

# VS

if (
    not is_node_in_type_annotation_context(node)
    and isinstance(node.parent, astroid.Subscript)
):
    pass

But it is not that obvious in more complex cases like in our own tests (example taken from the original PR test code modifications):

if prevp.parent and prevp.parent.type in {
    syms.typedargslist,
    syms.varargslist,
    syms.parameters,
    syms.arglist,
    syms.argument,
}:

# VS (what cdce8p wrote)

if (
    prevp.parent
    and prevp.parent.type
    in {
        syms.typedargslist,
        syms.varargslist,
        syms.parameters,
        syms.arglist,
        syms.argument,
    }
):

# VS (another possibility, which I kinda like)

if (
    prevp.parent
    and prevp.parent.type in {
        syms.typedargslist,
        syms.varargslist,
        syms.parameters,
        syms.arglist,
        syms.argument,
    }
):

But the problem is more general than if statements. I closed lots of similar issues above, but I feel like even #236 and #348 are closely related to this. So I think we should really come up with a careful set of rules, or better yet: examples of what parentheses should look like if we're going to continue pursuing this any further.

To that end, I'd like to ask from the maintainers and especially @ambv: has the ship sailed? Are you absolutely opposed to any changes of this sort? If yes, I don't think we should waste any more resources discussing it. But if not, I'd like to thoroughly investigate the possibilities and come up with something that could be worth doing.

felix-hilden avatar Jun 16 '21 17:06 felix-hilden

Matching the way people write code

This is less about how people write code and more about how people read it. The need for a tool like Black tells use that the way people write code is actually inconsistent and messy. Users choose to cede control over formatting to Black because the readability of the resulting code is, in 99% of cases, as good or better than it was before. It seems evident that this problem case makes up a disproportionate part of that 1% of outputs which have subpar readability.

Is the proposed style better?

In my opinion, the resulting code in the given counter examples is not worse than it was before. If it turned out that this was representative of the worst case scenario, I don't think asking whether the proposed rules make improvements in all cases is really what we should focus on. Rather, we should evaluate whether the improvements which we already know about outweigh the costs of the disruption these changes would cause.

rmorshea avatar Nov 04 '21 04:11 rmorshea

I think it's also worth noting that much of the discussion around this change happened before the new stability policy. The --future flag could be a great opportunity to introduce this change and collect feedback without any real disruption. If it turns out from the feedback that this does more harm than good, we can just revert it before the next annual release.

rmorshea avatar Nov 04 '21 04:11 rmorshea

I just add my voice in favor of this issue with the following example :

                mnopqrstuvwxyza = (
                    set(stuvwxyz_abcdefghi[pqrstuvwxy].keys())
                    - set(stuvwxyz_abcdefghij[pqrstuvwxy].keys())
                )

black currently suggests this modification that is sub-optimal for readability :

-                mnopqrstuvwxyza = (
-                    set(stuvwxyz_abcdefghi[pqrstuvwxy].keys())
-                    - set(stuvwxyz_abcdefghij[pqrstuvwxy].keys())
+                mnopqrstuvwxyza = set(stuvwxyz_abcdefghi[pqrstuvwxy].keys()) - set(
+                    stuvwxyz_abcdefghij[pqrstuvwxy].keys()
                 )

I did not check if the PR handles this case also, but I hope it is the case :)

LLyaudet avatar Jan 11 '22 10:01 LLyaudet

Though I post a small update. Unfortunately, I don't have time to work on it anytime soon. If someone else is interested in picking this up, please feel free to do so!

cdce8p avatar Jan 15 '22 21:01 cdce8p

Another case from #587: I'd expect binary operators to be formatted similarly to boolean operators.

felix-hilden avatar Feb 01 '22 18:02 felix-hilden

A specific example, quoting from https://github.com/psf/black/issues/587#issuecomment-475581498:

I have a code snippet as follows:

        looks_like_pep = (
            file_path.startswith("pep-") and
            file_path.endswith((".txt", "rst")
        )

With line length 79, black re-formats this code to:

        looks_like_pep = file_path.startswith("pep-") and file_path.endswith(
            (".txt", "rst")
        )

pradyunsg avatar Feb 01 '22 18:02 pradyunsg

This is intentional, there is no automated rule we can do that will behave better. If we always put organizing parentheses whenever there is only a single operator between two expressions, this would lead to things like

a = (
  bbbbbb(1234567890)
  + cccccc(123456790)
)

instead of

a = bbbbbb(123456790) + cccccc(
    123456790
)

This formatting also takes up fewer lines.

The rule now uses organizing parentheses when there's more than one operator involved. It's a very simple rule that is easy to explain. If we made deviations here, not only would we invite other edge cases, but we would also have trouble explaining what Black did in those suboptimal cases.

Therefore, I'm somewhat skeptical we can improve much here without making everything much more complicated.

ambv avatar Feb 01 '22 18:02 ambv

Fair points by Łukasz! (For the time being I'm just organising the issues. Still discussions to be had.)

A somewhat tricky case from #1094 though: when having both binary and boolean operators, the order of operations should determine which is used to split first:

if str(
    self.user.id
) != self.target_user_id and not CheckUserManagementViewPermission.user_has_permission(
    self.user
):
    pass

# becomes

if (
    str(self.user.id) != self.target_user_id
    and not CheckUserManagementViewPermission.user_has_permission(
        self.user
    )
):
    pass

felix-hilden avatar Feb 01 '22 18:02 felix-hilden

Personally, I think your newer example format is cleaner.

a = (
  bbbbbb(1234567890)
  + cccccc(123456790)
)

means you only have complete pieces of logic on each line rather than breaking inconsistently part way through functions.

It's easier to read complete sentences when they're together, and it creates more legible diffs:

--- a/example.py
+++ b/example.py
@@ -1,4 +1,4 @@
 a = (
   bbbbbb(1234567890)
-  + cccccc(123456790)
+  - dddddd(123456790)
 )

rather than

--- a/example.py
+++ b/example.py
@@ -1,3 +1,3 @@
-a = bbbbbb(123456790) + cccccc(
+a = bbbbbb(123456790) - dddddd(
     123456790
 )

xM8WVqaG avatar Feb 01 '22 18:02 xM8WVqaG

@felix-hilden it's confusing to cram both issues here. Let's focus here on the single-operator parentheses.

@xM8WVqaG Black will prefer fewer vertical lines over more, if it can be helped. Unfortunately, changing this will affect a lot of code which now formats okay and will format worse. It's a relatively easy change for you to make to your Black installation:

  1. Install it with pip install --no-binary=:all: black.
  2. Run python3 -c 'import black.lines; print(black.lines.__file__)'.
  3. Open the file under the path returned by the last command.
  4. Edit the line that says:
    if bt.delimiter_count_with_priority(max_priority) > 1:

to say > 0 instead. Test with as much code as you can lay your hands on. You'll see some regressions in behavior.

One more thing to note here that you won't be aware of unless you actually contributed to Black. Black formats everything (with one exception that is irrelevant here) in incoming token order, which usually is "left to right". This produces regular and predictable formatting because it doesn't "look forward" to try to reformat things and then decide whether they are nice or not.

In turn, it won't know the difference between, say:

a = (
    bbbbbb(123456790)
    + cccccc(
        1234567890
        + 1234567890
   )
)

instead of the much more reasonable

a = bbbbbb(123456790) + cccccc(
    1234567890 + 1234567890
)

ambv avatar Feb 01 '22 18:02 ambv

It's difficult since it would effect almost every project that uses black today. However, just from seeing how many related issues were created, it looks like many users would like an improvement (me included).

Just from the initial testing I've done with #2237, it might be worth limiting any changes to LOGIC_PRIORITY or higher for now. There are still some corner cases with that, but much fewer than when including all operators.

cdce8p avatar Feb 01 '22 19:02 cdce8p

While many people do report this, it's hard to know where the majority opinion lies just by looking at the number of issues. The people who are satisfied with the current behavior aren't posting issues, and thus, are not participating in this conversation. I'd personally appreciate some change here, but it's important to remember that there are a lot of black users whose viewpoints are not being represented.

rmorshea avatar Feb 01 '22 19:02 rmorshea

The "hierarchical first" line-split seems always better to me. If not chosen as the sensible default, it would be nice to have it as a configuration option.

LLyaudet avatar Feb 02 '22 09:02 LLyaudet

I think whatever is decided here, we won't have a configuration option. We don't want to give any unnecessary control over formatting.

felix-hilden avatar Feb 02 '22 11:02 felix-hilden

@ambv I'm sorry, but for me the 'counter-examples' that you show look better/are easier to read than what black is suggesting. Maybe there are different ways how different people read code. But especially in your two examples:

a = bbbbbb(123456790) + cccccc(
    123456790
)

a = bbbbbb(123456790) + cccccc(
    1234567890 + 1234567890
)

the closing parentheses are visually completely dis-aligned to the code which they enclose, because they are vertically and horizontally separated from their opening parenthesis.

It is not only that, if you need line breaks, one would prefer to have both objects of an operation horizontally aligned and having complete pieces of logic on one line, but also about the quick identification of matching parentheses and their enclosed content.

MarkusPiotrowski avatar Feb 02 '22 12:02 MarkusPiotrowski

@felix-hilden here is a possible take on the example from tests

if prevp.parent and prevp.parent.type in {
    syms.typedargslist,
    syms.varargslist,
    syms.parameters,
    syms.arglist,
    syms.argument,
}:

# start from scratch

if prevp.parent and prevp.parent.type in {syms.typedargslist, syms.varargslist, syms.parameters, syms.arglist, syms.argument,}:

# add parentheses

if (
    prevp.parent
    and prevp.parent.type in {syms.typedargslist, syms.varargslist, syms.parameters, syms.arglist, syms.argument,}
):

# still long, add more

if (
    prevp.parent
    and (
        prevp.parent.type
        in {syms.typedargslist, syms.varargslist, syms.parameters, syms.arglist, syms.argument,}
    )
):

# still long

if (
    prevp.parent
    and (
        prevp.parent.type
        in {
            syms.typedargslist,
            syms.varargslist,
            syms.parameters,
            syms.arglist,
            syms.argument,
        }
    )
):

# try to inline in reverse

if (
    prevp.parent
    and (
        prevp.parent.type in {
            syms.typedargslist,
            syms.varargslist,
            syms.parameters,
            syms.arglist,
            syms.argument,
        }
    )
):

@felix-hilden this is kinda similar to your last option. Frankly, it looks like it could be inlined once more and it will be exactly it. Anyway, putting and and in on the same level here isn't right IMO.

lig avatar Feb 04 '22 16:02 lig

@lig I think you could continue to inline twice more back to the original desired style from the test. I do not think it's unreadable.

A recursive algo which inlines leaves as it returns from the call stack as long as the max line length is not violated is a good way to go about it. The point is to break the outer-most element if the line is too long. It's what my intuition is, and what I would write naturally if I cared. In reality I just let everything go to the right and let black clean it up 😁. I have even written a pretty printer that used that exact algo because pprint looks awful.

Examples of different max-line-lengths of @ambv's example with the stated algo.
# max-line-length=1
aaaaaaaaaa = (
  bbbbbb(
    123456790
  )
  + cccccc(
    1234567890
    + 1234567890
  )
)

# max-line-length=20
aaaaaaaaaa = (
  bbbbbb(123456790)
  + cccccc(
    1234567890
    + 1234567890
  )
)

# max-line-length=40
aaaaaaaaaa = (
  bbbbbb(123456790)
  + cccccc(1234567890 + 1234567890)
)

# max-line-length=60
aaaaaaaaaa = (
  bbbbbb(123456790) + cccccc(1234567890 + 1234567890)
)

# max-line-length=80
aaaaaaaaaa = bbbbbb(123456790) + cccccc(1234567890 + 1234567890)

ktbarrett avatar Feb 22 '22 02:02 ktbarrett

I would suggest (re)considering the use of a configuration option to balance the interests here. The magic trailing comma is a similar case - it's a syntactically benign way to signal formatting intent.

Some of the formatting examples above are so pathological (as are the cases in our code base - I omit them as the topic is well covered above) that I think it should be a priority to address them in one way or another.

In the case of boolean expressions, adding a trailing and True is similar to the trailing comma - and cleans up the formatting but at the obvious expense.

I don't follow the argument that fewer lines of code trump readability.

Disruption to existing code is obviously a big deal. But if the implementation is to use grouping parenthesis as a cue for the new formatting rules, these would have to be explicitly added and existing code should not be impacted. Again, similar to the trailing comma. And if there's a config option - also similar to trailing comma - there's no impact at all or users could opt out (depending on the default behavior).

gar1t avatar Apr 12 '23 14:04 gar1t

I would suggest (re)considering the use of a configuration option to balance the interests here.

We avoid that here.

I don't follow the argument that fewer lines of code trump readability.

They usually do because they allow more content to fit one screen or page. Black additionally prefers to format things in a way where lines read from left to right instead of pre-emptively exploding into one-element-per-line.

The unclear examples in this open issue are obviously controversial, but a ton of others aren't, and nobody lists them here because they just look good. The current rule requiring "more than one delimiter" to use optional parentheses was added because without it, the output was clearly overusing parentheses leading to explosion of newlines in unexpected places.

The problem with a tool that applies the same rules every time is that in some cases a given rule leads to output that is less obvious. It seems here most pushback stems from using square brackets or parentheses from a call to split a line.

In the case of organizing optional parentheses, there is no easy-to-express rule to make it work for everybody. Most of all, it's to a large extent subjective. What to some is "so pathological that it should be a priority", is not a big deal (or is indeed preferred) by others.

Using existing parentheses as a marker that they should be kept makes the tool less automatic. It will then happily add parentheses in some cases but when you modify the code, it won't take them out. That's already a source of confusion for users when it comes to trailing commas, and adding parentheses to the mix would make this problem worse.

Originally Black didn't automatically put or remove parentheses, it would just respect what it found already in the file. This was widely criticized, I received many requests to change it. This issue argues, in essence, to roll this change back, which would be an unpopular decision.

ambv avatar Apr 12 '23 16:04 ambv

I would suggest (re)considering the use of a configuration option to balance the interests here.

We avoid that here.

Maybe these couple of cases are those good candidates for having options for them? AFAIR, skip-string-normalization was kinda similarly controversial and it took Guido's opinion to make this into an option. These two are obviously controversial, have different logic already, and confuse both sides.

lig avatar Apr 21 '23 19:04 lig

Just an extra little bump here. I stumbled on this issue because the logic trees in my application are getting worse and worse due to this formatting choice. If I had the option to cause black to format these more nicely, I'd do it in a heartbeat.

if self._rule_has_same_enviroment_tag(
    rule=rule, live_rule=live_rule
) and self._rule_has_matching_source(rule=rule, live_rule=live_rule):
    return live_rule

# vs

if (
    self._rule_has_same_enviroment_tag(rule=rule, live_rule=live_rule)
    and self._rule_has_matching_source(rule=rule, live_rule=live_rule)
):
    return live_rule

The first one is almost unreadable to me.

Wyko avatar May 16 '23 11:05 Wyko

Very much needed. Current behavior is sometimes so unreadable that I have to wrap code in #fmt: on/off.

rijenkii avatar May 26 '23 09:05 rijenkii

@JelleZijlstra invited a PR to address #3800: comments that get egregiously moved. The fix could potentially allow commenting to preserve optional parentheses. Not saying that's good or bad, just relevant to this discussion.

Putative "magic comment" in action:

a = (
    bbbbbb(123456790)  #
    + cccccc(123456790)
)

itcarroll avatar Aug 01 '23 04:08 itcarroll

I'd like a real solution but at minimum I think this is a problem with the docs. Similar to https://github.com/psf/black/issues/2156#issuecomment-1505408040 adding an extra or None at least makes it readable.

_cfg_dir = _os.environ.get("RYO_USER_CONFIG_DIR") or _appdirs.user_config_dir(
    "ryo-iso", "HxR"
)
_cfg_dir = (
    _os.environ.get("RYO_USER_CONFIG_DIR")
    or _appdirs.user_config_dir("ryo-iso", "HxR")
    or None
)

Aside from --line-length 99, I'm still not sure how to improve this sort of thing.

(self.base_image["distro"], self.base_image["version"]) = self.iso_cfg[
    "image"
].split("/")

LucidOne avatar Aug 05 '23 18:08 LucidOne