selectize.js icon indicating copy to clipboard operation
selectize.js copied to clipboard

respect_word_boundaries: true breaks when first character of the search term is non-ASCII

Open spacekpe opened this issue 2 years ago • 12 comments

I did:

  • [x] Search for if my issue has already been submitted
  • [x] Make sure I'm reporting something precise that needs to be fixed
  • [x] Give my issue a descriptive and concise title
  • [x] Create a minimal working example on JsFiddle or Codepen (or gave a link to a demo on the Selectize docs)
  • [x] Indicate precise steps to reproduce in numbers and the result, like below

Non-ASCII/Unicode character at the beginning of an option string cannot be looked up using search.

Steps to reproduce:

  1. Use code from https://jsfiddle.net/w9gecnyo/4/
  2. Search for one of the two Unicode characters: "č" or "Č"

TL;DR Define two options, like "Čápkova" and "Ečerova", and then search for "č" or "Č" with respect_word_boundaries enabled (default).

Expected result: Only option "Čápkova" should be listed (there is a match on the first letter, i.e. word boundary).

Actual result: Only option "Ečerova" is listed - presumably because non-ASCII character does not act as a word boundary?!

As far as I can tell, this is caused by \b added in Sifter for respect_word_boundaries: true. This looks like problem with \b definition, so Unicode-aware word boundary detection needs some other trick.

This attempt at regex101.com seems to confirm that:

screenshot

SO seems to somewhat agree with this diagnosis: https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters

spacekpe avatar Nov 20 '22 14:11 spacekpe

For now, I'm reverting the default behavior of respect_word_boundries to false. This will work the same as it had prior to introduce the new feature. I do think that we need much better unicode support in general, which will be a bigger fix.

Good catch!

risadams avatar Nov 22 '22 13:11 risadams

Hi guys, I think I'm facing the same issue. But in my case, searching for Hebrew letter doesn't return anything.

  • for example searching: ש
  • English letter are OK.

https://jsfiddle.net/sw9Lkcdy/4/

heyyo-droid avatar Dec 28 '22 13:12 heyyo-droid

Any chance this default value respect_word_boundaries set to false, will be part of a release ?
We are using library coming from npmjs, they don't provide dev version.
https://www.npmjs.com/package/@selectize/selectize

heyyo-droid avatar Feb 07 '23 09:02 heyyo-droid

Found a related issue here with a solution to set respect_word_boundaries Below will set respect_word_boundaries to default false. Fixed it for me. Please let us know if a more elegant solution exists.

var getSearchOptions = Selectize.prototype.getSearchOptions;
Selectize.prototype.getSearchOptions = function () {
	var options = getSearchOptions.apply(this, arguments);
	options.respect_word_boundaries = false;
	return options;
};

AndersFreund avatar Feb 15 '23 15:02 AndersFreund

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Jun 16 '23 02:06 github-actions[bot]

Issues don't magically fix themselves, do they? (That's reaction to the bot.)

pspacek avatar Jun 20 '23 16:06 pspacek

This problem also occurs when searching for Chinese. There are 一二三 in the options, and the option cannot be searched by typing . example: https://codepen.io/big-dream-the-solid/pen/poqGWrB

My solution for this problem is to use an older version like: 4.6.9

big-dream avatar Oct 10 '23 07:10 big-dream

The example from @heyyo-droid also breaks with the english dash - character, if you have an item like Item - 3 it will filter out as soon as you type a dash.

rcuhljr avatar Nov 06 '23 16:11 rcuhljr

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Mar 06 '24 02:03 github-actions[bot]

Bot, this issue is still relevant

spacekpe avatar Mar 07 '24 13:03 spacekpe

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Jul 06 '24 02:07 github-actions[bot]

Hi, can't we just activate unicode support in the regular expression ?

See the initial example with u flag activated:

Capture d’écran 2024-07-10 à 10 40 00

bplace avatar Jul 10 '24 08:07 bplace