black Black joins lines when it shouldn't

Describe the bug

Black doesn't count line length correctly in case of khmer characters. They fit screen-wise (kind of) but character length goes beyond limit.

To Reproduce

For example, take this code:

@pytest.mark.parametrize(
    "text, expected_language",
    [
        (
            (
                "សម្រស់ទាវ២០២២ មិនធម្មតា ឥឡូវកំពុងរកតួ នេនទុំ និងពេជ្រ"
                " ប្រញាប់ឡើងទាន់គេមានបញ្ហាត្រូវថតឡើងវិញ "
            ),
            "km",
        ),  # Khmer
    ],
)
def test_detect_language(text, expected_language):
    assert detect_lang_of_text(text) == expected_language

And run it with these arguments:

$ black file.py --target-version py312 --line-length 100 --diff

The resulting error is:

@@ -1,13 +1,10 @@
 @pytest.mark.parametrize(
     "text, expected_language",
     [
         (
-            (
-                "សម្រស់ទាវ២០២២ មិនធម្មតា ឥឡូវកំពុងរកតួ នេនទុំ និងពេជ្រ"
-                " ប្រញាប់ឡើងទាន់គេមានបញ្ហាត្រូវថតឡើងវិញ "
-            ),
+            ("សម្រស់ទាវ២០២២ មិនធម្មតា ឥឡូវកំពុងរកតួ នេនទុំ និងពេជ្រ" " ប្រញាប់ឡើងទាន់គេមានបញ្ហាត្រូវថតឡើងវិញ "),
             "km",
         ),  # Khmer
     ],
 )
 def test_detect_language(text, expected_language):
would reformat file.py

All done! ✨ 🍰 ✨
1 file would be reformatted.

Expected behavior

No changes because line won't fit.

Environment

Black's version:

black, 24.2.0 (compiled: yes)
Python (CPython) 3.12.1

OS and Python version: Linux/Python 3.12.1

Additional context

The line is split correctly before changes but when joined like black does (even when we remove " " and parens) then it doesn't fit inside 100 characters.

Feb 15 '24 17:02 hwlodarczyk-rtbh

This was changed as a result of #3445.

Could you explain why you think this is a bug? The line is displayed as <100 characters in length, even if it contains more than 100 characters.

Feb 15 '24 17:02 JelleZijlstra

In our case this is a problem because our Sonar later reports that line goes over 100 characters. I've used # fmt: off to prevent black from changing it.

I also use Pycharm with visual lines to indicate if it goes over 100 and 120 and with black' formatting it looks like this:

It clearly goes over the line which indicates that depending on editor the line may visually go over or not which is inconsistent. In this case if it relied on character count it would always go below visual limit (unless there are wider characters which count as 1 character which I don't know of).

Feb 15 '24 17:02 hwlodarczyk-rtbh

unless there are wider characters which count as 1 character which I don't know of

This is fairly common for Chinese characters, see the issue I linked.

I assume Sonar is https://www.sonarsource.com/products/sonarlint/ ? Possibly linters should also adjust their line length computations to take character width into account. However, that might be more trouble than it's worth. I think I'd be OK with adjusting our logic to treat all characters as having a minimum length of 1. This would go into the preview style now and into the stable style for 2025.

Feb 15 '24 17:02 JelleZijlstra

Black joins lines when it shouldn't - short width characters.