hypher
hypher copied to clipboard
word splitting regex fails with underdot characters
Hi,
I'm using hypher.js with transliterated Sanskrit, and it doesn't play well with characters such as ṇ, ṣ, ḍ, ṭ, etc. The problem seems to be the long regex used to split a string into words (line 107 of hypher.js). I guess your character class doesn't include the unicode ranges for underdot characters. I've replaced it with a simpler expression:
var words = str.split(/([\s\n\r\t.,:;'"!?-])/g);
which matches word boundary characters instead of word characters. It works for me but it's not totally comprehensive... you would have to add a few more boundary characters to it to make it work for more languages...