badwords icon indicating copy to clipboard operation
badwords copied to clipboard

support for accents

Open zzgab opened this issue 4 years ago • 0 comments

Some languages (like French) use accents in words. Example assécher means to dry up (skin, hair etc.). The native JS RegExp \b splits words in a naive, Latin-only way, so the character é gets interpreted as a word separator, thus yielding to ass é cher and ass gets censored out.

So we end up with ***écher which is nonsense in French.

This PR is an improvement to the previous, incomplete attempt that had been made to support the French via user-provided word sep.

The new option enhancedWordSep, defaulting to false, will use a separation regexp which works for accented languages.

zzgab avatar Apr 19 '21 16:04 zzgab