kuroshiro
kuroshiro copied to clipboard
TypeError: str is not iterable (Edgecase)
Getting following error while using kuroshiro
but it is only in some cases. 90% of the time, it is not throwing any error. I do not have the input to test for this case.
Stacktrace
TypeError: str is not iterable
at toRawHiragana (/server/node_modules/kuroshiro/lib/util.js:177:14)
at /server/node_modules/kuroshiro/lib/core.js:225:88
at Generator.next ()
at asyncGeneratorStep (/server/node_modules/kuroshiro/lib/core.js:10:103)
at _next (/server/node_modules/kuroshiro/lib/core.js:12:194)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
This issue occurs when converting a sentence having ・
(U+30FB) character(s) in it. I chose to replace this character with ·
(U+00B7) character only during the conversion and I'm not having this problem anymore.
Here is a minimal code reproducing the problem (I encountered this problem using furigana
mode, but it might occur in different modes too):
const Kuroshiro = require("kuroshiro");
const KuromojiAnalyzer = require("kuroshiro-analyzer-kuromoji");
const sample = async () => {
const sentence1 = "映画『ジュラシック·パーク』の恐竜は本物そっくりだ。";
const sentence2 = "映画『ジュラシック・パーク』の恐竜は本物そっくりだ。";
const kuroshiro = new Kuroshiro();
await kuroshiro.init(new KuromojiAnalyzer());
kuroshiro.convert(sentence1, { mode: "furigana", to: "hiragana" }); // Does not throw
kuroshiro.convert(sentence2, { mode: "furigana", to: "hiragana" }); // Throws
};
sample();
You could imagine having two functions to do this job of converting back and forth:
const sanitizeJapaneseSentence = (sentence: string) => sentence.replace(/・/gi, '·');
const unsanitizeJapaneseSentence = (sentence: string) => sentence.replace(/·/gi, '・');
Hope this can help!