gpui-component Editor: improve double-click text selection

Nov 04 '25 01:11 FlyingYu-Z

CJK selection is not correct, we should not choose the whole section.

Nov 04 '25 01:11 huacnlee

I've looked at your 2 PRs, and some of the technical aspects are still not ideal.

We are open to these improvements.

However, to improve efficiency, my suggestion is to make only minor changes and avoid altering too much content, especially complex logic.

This will make it easier for us to review and merge.

Nov 04 '25 02:11 huacnlee

Should we consider using the unicode-width crate to detect full-width (CJK or emoji) characters instead of checking UTF-8 byte length? It might handle wide characters more accurately.

Nov 04 '25 03:11 FlyingYu-Z

为了便于你理解我的意思，我用中文说一下。

上次我的意思是两点：

少量修改，不要重构，你现在新的提交反而改了更多（微小的改进应该微小的修改）。
参考我们之前在 GPUI 里面的 is_word_char 函数那样实现一个函数（并覆盖测试），你新的 CharType 这个实现让整个变复杂了，并且其实规则被分散到逻辑里面，反而不清晰了。

然而你看这个 is_word_char，那些是词，很清晰，以后规则微调也很好维护。

pub(crate) fn is_word_char(c: char) -> bool {
    // ASCII alphanumeric characters, for English, numbers: `Hello123`, etc.
    c.is_ascii_alphanumeric() ||
    // Latin script in Unicode for French, German, Spanish, etc.
    // Latin-1 Supplement
    // https://en.wikipedia.org/wiki/Latin-1_Supplement
    matches!(c, '\u{00C0}'..='\u{00FF}') ||
    // Latin Extended-A
    // https://en.wikipedia.org/wiki/Latin_Extended-A
    matches!(c, '\u{0100}'..='\u{017F}') ||
    // Latin Extended-B
    // https://en.wikipedia.org/wiki/Latin_Extended-B
    matches!(c, '\u{0180}'..='\u{024F}') ||
    // Cyrillic for Russian, Ukrainian, etc.
    // https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode
    matches!(c, '\u{0400}'..='\u{04FF}') ||
    // Some other known special characters that should be treated as word characters,
    // e.g. `a-b`, `var_name`, `I'm`, '@mention`, `#hashtag`, `100%`, `3.1415`,
    // `2^3`, `a~b`, `a=1`, `Self::new`, etc.
    matches!(c, '-' | '_' | '.' | '\'' | '$' | '%' | '@' | '#' | '^' | '~' | ',' | '=' | ':') ||
    // `⋯` character is special used in Zed, to keep this at the end of the line.
    matches!(c, '⋯')
}

https://github.com/zed-industries/zed/blob/21f73d9c02681152019ed5703ce8808c841fcbbe/crates/gpui/src/text_system/line_wrapper.rs#L169-L191

Nov 09 '25 03:11 huacnlee

我觉得在双击选择功能里，对点击的字符进行分类是很有必要的，分类的目的是判断字符之间的“可连接性”，从而准确确定选择的边界，仅仅用is_word_char是没办法区分字符边界的，这样会造成所有粘连的字符都选择到一起，体验很不好

举个例子，文本： save_data()

点击 s 应该选中整个 save_data，而不是只选 save 或 save_，因为下划线作为了连接符。
点击 ( 或 )，应该只选括号本身，因为括号归类为了标点类型，同类型可以连接。
_ 作为连接符，允许字母和数字进行连接，在is_connectable方法里体现了
空格我单独归类为了Space类型，点击到空格的时候会选择相邻的空格

上次你说的中文不应该选择全部，所以我理解的是，点击中文就只选择点击的那个字符，不连选。所以我给CJK归类为了Other类型，点击到Other类型的时候不会连选，只会选择单个字符
选择返回我改成了在单行范围内选择，因为VSCode和普通编辑器也没有双击进行跨行选择的功能
另外如果点击到行末尾的时候会向左选择，尽量模仿了VSCode

另外，我觉得双击选择功能本身并不是一个“小功能”。
即便是像 VSCode、IDEA 这样的大型 IDE，双击选择的实现也非常复杂，因为它涉及：

多种字符类型：字母、数字、下划线、标点、空格、换行、中文、emoji 等，每种字符的选择边界规则不同
特殊符号处理：函数名、标识符、运算符等，需要区分哪些可以连成一整个，哪些只能单独选择
跨语言适配：支持多语言文本和各种Unicode 字符

因此，实现字符分类很有必要，双击选择也应该是一个单独的模块

Nov 09 '25 11:11 FlyingYu-Z