nucleo icon indicating copy to clipboard operation
nucleo copied to clipboard

nucleo-matcher documentation: Please clarify what matching "indices" actually are.

Open markus-bauer opened this issue 7 months ago • 0 comments

The Documentation sounds like it would be character indices : All .._indices functions will also compute the indices of the matched characters. The example code below shows that this is not the case.

Instead they are indices into nucleo's Utf32Str. And these are built by taking the first character of a grapheme(cluster): https://github.com/helix-editor/nucleo/blob/5b74652e482f7c07d827f18c6d21e7540c242c69/matcher/src/chars.rs#L185-L207 But only if the unicode_segmentation feature is active (which is on by default), otherwise it actually is character indices.

And helix (which highlights matches in the picker, for example), also treats it as grapheme "indices":

https://github.com/helix-editor/helix/blob/cfb5158cd1b3e5e1962eda66e673c0c35b786046/helix-term/src/ui/picker.rs#L786-L790

Can you please clarify this in the doc, and perhaps provide an example showing how you would use those indices with the original haystack string.

A further note about the example below: It was compiled with the newest version from github. The crates.io version doesn't work at all and produces weird grapheme segmentation. So please push a new release.

use nucleo_matcher::pattern::{Atom, AtomKind, CaseMatching, Normalization};
use nucleo_matcher::Utf32Str;
use unicode_segmentation::UnicodeSegmentation;

fn test(haystack: &str, needle: &str) {
    let mut matcher = nucleo_matcher::Matcher::new(nucleo_matcher::Config::DEFAULT);

    let atom = Atom::new(
        needle,
        CaseMatching::default(),
        Normalization::default(),
        AtomKind::Substring,
        false,
    );

    let mut buf = Vec::new();
    let nucleo_string = Utf32Str::new(haystack, &mut buf);

    let characters = haystack.chars().collect::<Vec<_>>();
    let graphemes = UnicodeSegmentation::graphemes(haystack, true).collect::<Vec<_>>();
    let nucleo_chars = nucleo_string.chars().collect::<Vec<_>>();

    let matches = {
        let mut m = Vec::new();
        atom.indices(nucleo_string, &mut matcher, &mut m);
        m.into_iter().map(|a| a as usize).collect::<Vec<_>>()
    };

    println!("haystack: {}", haystack);
    println!("needle  : {}", needle);

    println!("characters  : {:?}", characters);
    println!("graphemes   : {:?}", graphemes);
    println!("nucleo chars: {:?}", nucleo_chars);

    println!("matching indices: {:?}", matches);
    println!("matching character: {:?}", characters.get(matches[0]));
    println!("matching grapheme: {:?}", graphemes.get(matches[0]));
    println!("matching nucleo chars: {:?}", nucleo_chars.get(matches[0]));
}

fn main() {
    test("abx", "x");
    println!();
    test("g̈bx", "x");
}
haystack: abx
needle  : x
characters  : ['a', 'b', 'x']
graphemes   : ["a", "b", "x"]
nucleo chars: ['a', 'b', 'x']
matching indices: [2]
matching character: Some('x')
matching grapheme: Some("x")
matching nucleo chars: Some('x')

haystack: g̈bx
needle  : x
characters  : ['g', '\u{308}', 'b', 'x']
graphemes   : ["g\u{308}", "b", "x"]
nucleo chars: ['g', 'b', 'x']
matching indices: [2]
matching character: Some('b')
matching grapheme: Some("x")
matching nucleo chars: Some('x')

markus-bauer avatar Sep 02 '25 12:09 markus-bauer