black icon indicating copy to clipboard operation
black copied to clipboard

Add linebreak in comprehension with complex element expression

Open Peilonrayz opened this issue 3 years ago • 17 comments

To me Black's heuristics are too eager to wrap comprehensions up into one line.

Do you think:

(Output from Black)

new_values = [
    left if left < right else right for left, right in values if 0 <= right
]

is easier to skim-read than:

new_values = [
    left
    if left < right else
    right
    for left, right in values
    if 0 <= right
]

I can change the code to have really long names to break Black's heuristics. And Black outputs a much easier to read format. The only problem is the names I've picked to get the nice readable format are way less readable. So now I've traded readability for readability.

new_values = [
    a_super_long_name
    if a_super_long_name < b_super_long_name else
    b_super_long_name
    for a_super_long_name, b_super_long_name in values
    if 0 <= b_super_long_name
]

(Output from Black)

new_values = [
    a_super_long_name
    if a_super_long_name < b_super_long_name
    else b_super_long_name
    for a_super_long_name, b_super_long_name in values
    if 0 <= b_super_long_name
]

How can I up the heuristics so much so I can get the following output from Black?

new_values = [
    value
    for value in values
]

I'd much rather always have the above, than have complex comprehensions go on one line.

Peilonrayz avatar Apr 20 '21 22:04 Peilonrayz

Just use # fmt: off and # fmt: on to disable formatting for given code block. Fitting to the single line if possible imho definitely should remain a standard approach.

jaklan avatar Apr 26 '21 01:04 jaklan

I tend to agree that if there's an else statement in a comprehension, then exploding it usually always increases readability.

Maybe there's potential for a formatting rule there?

hukkin avatar Apr 27 '21 10:04 hukkin

@jaklan Using # fmt: on/off is not a reasonable solution for a couple of reasons.

  1. I want black to format the inner parts of the comprehension.

    Say someone uses ' rather than "; I'd want black to fix that. (I do at times, still converting to ") However the binary nature of # fmt: on/off will mean black won't fix issues with the code. Additionally we can see me and black have different opinions on how to format the else. When changing to black's style I will make mistakes inside the # fmt: off which black should fix.

  2. Knowing which comprehensions to # fmt: off is non-obvious.

    I'd not have thought a 7 line comprehension would be wrapped up into one line. As such I'm going to over use # fmt: off and handicap black leading to ¶1.

  3. Converting to black is a needlessly large effort.

    In some projects I have lots of comprehensions. To use black and keep my code readable I would have to manually go through the entire of my code and add # fmt: off everywhere.

  4. Converting my code so black only runs on 50% of the code is a waste.

    If I'm disabling black to the point where 50% of my code isn't being fixed, then what's the point in using black at all? I'd have to spend so much time adding # fmt: off and I'd still have to run other tools and manually fix any errors in my comprehensions.

Upping black's heuristics would fix all of the issues with # fmt: off. Which is why I asked:

How can I up the heuristics so much so I can get the following output from Black?

Peilonrayz avatar Apr 27 '21 11:04 Peilonrayz

@Peilonrayz You simply can't. Black is about compromises driven by the will of the majority / maintainers, not about fitting needs of each individual. You have a different definition of readability than black? So just use yapf and configure it as you wish.

jaklan avatar Apr 27 '21 11:04 jaklan

Please let the maintainers of black speak on behalf of black.

Peilonrayz avatar Apr 27 '21 12:04 Peilonrayz

So please mention people who have your blessing to answer the questions if you don't want to get them from the community - no need to waste time then.

jaklan avatar Apr 27 '21 12:04 jaklan

I agree with breaking apart long line list comprehensions would be a nice to have. But, I feel two main things are stopping us from considering this:

  1. I don’t think the code in order to do this sanely would be simple. I refer to the difficulty we’ve had with the magic-trailing-comma feature.
  2. It would be very disruptive and probably upset a lot of black users (If we can somehow see if majority of users would welcome this, that would elevate this speculation)

I believe the logic today is all just based on line lengths and nothing more. I would welcome proof of concept PRs to address point 1 and see if the code is complex and how many edge cases we uncover. This does not guarantee merging tho.

cooperlees avatar Apr 27 '21 19:04 cooperlees

I've not read black's code but I can think of two simpler ways to add support in black:

  1. Multiply the character length black sees for the comprehension by a factor of n. With a factor of 5 a 10 character comprehension would be seen as 50 characters. 5 is just an example, I can't imagine many people would want more than 2.

  2. Have a separate line length limit for comprehensions.

    For example if we set the length limit to 20 characters the following would be fine:

    my_super_long_name = [x for x in values]
    

I think special casing turneries like hukkinj1 suggested would probably be pretty hard to implement.

However a better method more inline with black's ethos may exist. @cooperlees do you have any opinions on which direction I should set out on an example PR?

Peilonrayz avatar Apr 27 '21 19:04 Peilonrayz

To provide another opinion, I'd like to see this feature too and I mostly agree. But not all the way.

# This I like, and having "else" at the beginning of the line
# rather than at the end of the previous is better in my opinion
# (meaning I'd also like to see this with the shorter names)
new_values = [
    a_super_long_name
    if a_super_long_name < b_super_long_name
    else b_super_long_name
    for a_super_long_name, b_super_long_name in values
    if 0 <= b_super_long_name
]

# This is too much, as it's very easy to read on one line
new_values = [
    value
    for value in values
]

So to me it's a question of complexity. Going beyond just comprehensions, this could be reformatted as well:

some_value = (
    value
    if cond
    else another_value
)

but I don't think it should. At least when not speaking about the implementation of Black, I could see a way of formulating a complexity heuristic for an expression, which could be used to explode statements that don't exceed the line length limit. For example, "explode everything that has two or more nested statements":

[i for i in range(3)]
[
    i
    for i in range(3)
    if i < 2  # Debatable whether this is part of "for" or not but anyways..
]
(a if c else b)
(
    a
    if c1
    else b
    if c2
    else c
)

But I'm not in the know of Black internals, nor am I aware of all the complexities of these rules out in the wild. Thoughts?

felix-hilden avatar May 18 '21 07:05 felix-hilden

In #2841, a set of criteria for this kind of splitting was proposed. I argued (and presented some examples) that they were too loose, meaning they split too many simple statements, but it could be a good starting point. They were:

  • It has more than one attribute access in any single term of the overall expression
  • It has more than one operator in any single term of the overall expression
  • A single term in the expression contains both an attribute access and an operator
  • The entire expression has a length exceeding 40 chars
  • It has an if clause
  • It has more than 1 for clause

Let's continue hashing this out!

felix-hilden avatar Jan 31 '22 11:01 felix-hilden

Given the age on this, I have little hope it will be addressed.

List comprehensions are one of only 3 major issues I have with black. In most cases there isn't any good reason to modify anything, and doing so nearly always makes it worse.

It feels like comprehensions are treated fundamentally wrong 🤷 but I'm having a hard time describing it. They are their own "thing", but the formatter treats them as an expression to squash - but they are a loop and conditional and follow their own rule(s). There is no good reason to force them to a single line.

The first multi-line (as written) is readable, but the black squashed version is awful.

results = [
    self.generate_result_item(f"B{x}", size="XL")
    for x in range(100) if x %2 == 0
]
results = [self.generate_result_item(f"B{x}", size="XL") for x in range(100) if x % 2 == 0]

Adding the magic comma results in a slightly better version. That works but isn't what was desired. If it can avoid collapsing on one it should be fine on the other too. The expanded if isn't desired, but at least its consistent.

results = [
    self.generate_result_item(
        f"B{x}", 
        size="XL",
    )
    for x in range(100) 
    if x %2 == 0
]

I hope not, but comprehensions might just be too complicated and varied to format well deterministically, just like SQL.

awbacker avatar Jul 22 '22 11:07 awbacker

@felix-hilden Is there still interest in a fix for this issue?

I appreciate that it will be difficult to come up with a good set of heuristics for formatting comprehensions along the lines of those you list. I think this is an important issue for readability, so I'm happy to try to come up with better heuristics if there's a chance they might get merged.

Alternatively, I have the following thought.

Would it be possible to add a flag to black to stop black from removing line breaks from comprehensions? It seems the problem is always that black collapses lines in comprehensions that were separated for readability.

When such a flag is passed, black would still be permitted to add line breaks, to meet it's line length requirement and other requirements. To some degree, that flag would provide the best of both worlds, and I think it would make it much easier to use black for those of us with complex comprehensions in our code.

nsfinkelstein avatar Mar 11 '23 16:03 nsfinkelstein

Thank you for the conversation! To add another aspect, black formatted a list comprehension in a surprising — and for me disagreeable — manner:

bla = [
    {
        "multiple": "entries",
        "in": {"this: "dictionary"},
    }
    for a, b, c in some_object.do_stuff(
        arg1, arg2, arg3
    )
]

whereas both for and in would have fitted nicely on their own lines:

bla = [
    {
        "multiple": "entries",
        "in": {"this: "dictionary"},
    }
    for a, b, c 
    in some_object.do_stuff(arg1, arg2, arg3)
]

For me the second version is quite a bit more readable.

jenstroeger avatar May 17 '23 21:05 jenstroeger

@nsfinkelstein I like the sound of your suggestions - just don't edit the newlines in a comprehension outside of function calls embedded within it.

If the temptation to reformat can be overcome, I think this would solve all potential issues:

if expression is ListComprehension:
    raise SkipReformatting()

awbacker avatar May 18 '23 00:05 awbacker

@awbacker Thanks. Just to be clear, that's not exactly the suggestion. I'm suggesting that black still be permitted to add linebreaks in a comprehension (list or otherwise) to meet its other requirements, but that it no longer be permitted to remove them.

noam-delfina avatar Jun 08 '23 16:06 noam-delfina

@noam-delfina I see. A little disagreement then, I don't think black should format them at all other than spacing. If you find yourself in a hole, first rule is to stop digging!

awbacker avatar Jun 09 '23 02:06 awbacker