selectize.js respect_word_boundaries: true breaks when first character of the search term is non-ASCII

trafficstars

I did:

[x] Search for if my issue has already been submitted
[x] Make sure I'm reporting something precise that needs to be fixed
[x] Give my issue a descriptive and concise title
[x] Create a minimal working example on JsFiddle or Codepen (or gave a link to a demo on the Selectize docs)
[x] Indicate precise steps to reproduce in numbers and the result, like below

Non-ASCII/Unicode character at the beginning of an option string cannot be looked up using search.

Steps to reproduce:

Use code from https://jsfiddle.net/w9gecnyo/4/
Search for one of the two Unicode characters: "č" or "Č"

TL;DR Define two options, like "Čápkova" and "Ečerova", and then search for "č" or "Č" with respect_word_boundaries enabled (default).

Expected result: Only option "Čápkova" should be listed (there is a match on the first letter, i.e. word boundary).

Actual result: Only option "Ečerova" is listed - presumably because non-ASCII character does not act as a word boundary?!

As far as I can tell, this is caused by \b added in Sifter for respect_word_boundaries: true. This looks like problem with \b definition, so Unicode-aware word boundary detection needs some other trick.

This attempt at regex101.com seems to confirm that:

screenshot

SO seems to somewhat agree with this diagnosis: https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters

Nov 20 '22 14:11 spacekpe

For now, I'm reverting the default behavior of respect_word_boundries to false. This will work the same as it had prior to introduce the new feature. I do think that we need much better unicode support in general, which will be a bigger fix.

Good catch!

Nov 22 '22 13:11 risadams

Hi guys, I think I'm facing the same issue. But in my case, searching for Hebrew letter doesn't return anything.

for example searching: ש
English letter are OK.

https://jsfiddle.net/sw9Lkcdy/4/

Dec 28 '22 13:12 heyyo-droid

Any chance this default value respect_word_boundaries set to false, will be part of a release ?
We are using library coming from npmjs, they don't provide dev version.
https://www.npmjs.com/package/@selectize/selectize

Feb 07 '23 09:02 heyyo-droid

Found a related issue here with a solution to set respect_word_boundaries Below will set respect_word_boundaries to default false. Fixed it for me. Please let us know if a more elegant solution exists.

var getSearchOptions = Selectize.prototype.getSearchOptions;
Selectize.prototype.getSearchOptions = function () {
	var options = getSearchOptions.apply(this, arguments);
	options.respect_word_boundaries = false;
	return options;
};

Feb 15 '23 15:02 AndersFreund

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

Jun 16 '23 02:06 github-actions[bot]

Issues don't magically fix themselves, do they? (That's reaction to the bot.)

Jun 20 '23 16:06 pspacek

This problem also occurs when searching for Chinese. There are 一二三 in the options, and the option cannot be searched by typing 一. example: https://codepen.io/big-dream-the-solid/pen/poqGWrB

My solution for this problem is to use an older version like: 4.6.9

Oct 10 '23 07:10 big-dream

The example from @heyyo-droid also breaks with the english dash - character, if you have an item like Item - 3 it will filter out as soon as you type a dash.