GUIX Multi-line Text View Word Wrapping Broken for UTF-8 Strings

Open ThomasBurgess2000 opened this issue 7 months ago • 2 comments

Describe the bug The GUIX multi-line text view widget has a bug in its word wrapping logic. The word boundary detection fails inconsistently, causing words to be split mid-character instead of at proper word boundaries (spaces, commas, semicolons).

I'm using GUIX 6.1.11

To Reproduce

Create a multi-line text view widget
Set text containing spaces, such as: "Hola. Esta es una demostractión de inglés"
Observe inconsistent word wrapping behavior

It appears that breaks work properly at punctuation like commas but not spaces? I think this is a UTF-8 issue

Inside the loop:

ch = string;

#ifdef GX_UTF8_SUPPORT
    _gx_utility_utf8_string_character_get(&string, GX_NULL, &glyph_len);
    current_index += glyph_len;
#else
    string.gx_string_ptr++;
    string.gx_string_length--;
#endif /* GX_UTF8_SUPPORT */

ch.gx_string_length = glyph_len;

So ch is meant to represent the current character (possibly multi-byte). But immediately afterwards, the code does single-byte checks like:

if (ch.gx_string_ptr[0] == GX_KEY_CARRIAGE_RETURN)
...
else if (ch.gx_string_ptr[0] == GX_KEY_LINE_FEED)
...
else if (((text_info -> gx_text_display_width + char_width) > available_width - 1) &&
         (text_info -> gx_text_display_number > 0) &&
         (ch.gx_string_ptr[0] != ' '))
...
if ((ch.gx_string_ptr[0] == ' ') || (ch.gx_string_ptr[0] == ',') || (ch.gx_string_ptr[0] == ';'))

For ASCII, ch.gx_string_ptr[0] works fine, since one character == one byte. But for UTF-8, ch.gx_string_length may be >1, but the code only checks the first byte of the UTF-8 sequence. Non-ASCII spaces (e.g. U+00A0 non-breaking space, U+3000 ideographic space) will never be recognized as valid breakpoints, because their first UTF-8 byte isn’t ' ' (0x20).

Also, this condition refuses to backtrack when the overflowing glyph is a space:

else if (((text_info->gx_text_display_width + char_width) > available_width - 1) &&
         (text_info->gx_text_display_number > 0) &&
         (ch.gx_string_ptr[0] != ' '))
{
    if (display_number == 0) {
        break;
    }
    text_info->gx_text_display_width = display_width;
    text_info->gx_text_display_number = display_number;
    break;
}

Expected behavior

Words should break at natural boundaries (spaces, punctuation)
Long words that exceed line width should break at word boundaries when possible
Consistent behavior between ASCII and UTF-8 text

Impact Annoyance, bad experience

Aug 29 '25 03:08 ThomasBurgess2000