stutter Non whitespace separated languages support

Non whitespace separated languages support

Open c01o opened this issue 3 years ago • 2 comments

Currently stutter uses /[\n\r\s]+/ as a delimiter, so languages not separated by itself, such as Japanese, are unusable. It seems google/budoux will do for at least Japanese, but since stutter finds word boundaries dynamically, it require some breaking changes.

Dec 08 '21 11:12 c01o

Some of this can be addressed in the locales.json, but it needs some additional attention in the Block.js file as well. I've got an open issue for Persian that will clean up most of that logic to be more flexible and I suspect Japanese will be more easily addressed then. I'll definitely need contribution help for it, though. Determining how to split based on kanji vs hiragana/katakana will be tricky.

Dec 08 '21 12:12 jamestomasino

TBH I highly doubt implementing Japanese-phrase(文節) detector will pay, and suggest use existing libraries.

Dec 08 '21 12:12 c01o

stutter stutter copied to clipboard

Non whitespace separated languages support

stutter
stutter copied to clipboard