Furiganaize kanji characters that has been furiganaed once may be furiganaed more than once

kanji characters that has been furiganaed once may be furiganaed more than once

Open inoueakimitsu opened this issue 2 years ago • 4 comments

Thanks for creating such a useful tool.

I ran Furiganaize on the following text:

作成します。作ります。

Then, I got the following DOM:

<ruby><rb><ruby><rb>作</rb><rt style="">つく</rt></ruby>成</rb><rt style="">さくせい</rt></ruby>します。<ruby><rb>作</rb><rt style="">つく</rt></ruby>ります。

The problem here is duplication of ruby tags.

I have implemented a simple way to fix this.

The modification changes the above output as follows:

<ruby><rb>作成</rb><rt style="">さくせい</rt></ruby>します。<ruby><rb>作</rb><rt style="">つく</rt></ruby>ります。

I am concerned that this modification will adversely affect other processes as I do not yet understand the overall processing flow of this program.

I would be happy to review it.

Jun 16 '22 16:06 inoueakimitsu

The solution above is not sufficient.

This cannot correctly furiganize the following sentence:

食べる。飲食店。

Output:

<ruby><rb>食</rb><rt style="">た</rt></ruby>べる。飲<ruby><rb>食</rb><rt style="">た</rt></ruby><ruby><rb>店</rb><rt style="">てん</rt></ruby>。

飲食店 should be いんしょくてん.

I think it may be possible to avoid this problem by processing long Kanji strings in preference to short Kanji strings.

Jun 16 '22 16:06 inoueakimitsu

Thanks for your detailed analysis for the behavior of this addon.

I cannot 100% sure that the longer matched furigana (the longer matched ((作成 (さくせい))) vs the shorter matched ((作(つく))(成(なる)))), so... I didn't try to fix this before. At least for a non-native speaker, I feel this "bug" can let me know the 読み方 of more individual / separated kanji... www
I just made some experiments on 「飲食店飲食店飲食店飲食店飲食店食べる。飲食店。食べる。飲食店。食べる。飲食店。食べる。飲食店。」, and found this
Actually I'm not that clearly understand how to handle with igo.js because this package is merely forked from ilyalissoboi's FuriganaInjectorPlusPlus, I mainly do some bugfixes, UI improvements, and add support for dynamic pages, so I nearly didn't change how does igo.js handle the sentences in Node of DOM... (汗) So, sorry I guess I cannot provide some usable advise about this issue (on the other hand, I indeed has no more free time to debug for this recently, too many tasks needed to be done everyday...)...
But if you're willing to dig in to this bug, it's very welcome still! (I'm considering that if you want to do this, maybe adding an option in options_ui as alpha-phrase testing, to prevent to affect the original users? Just a suggestion if really need to fix this issue.)

Jun 17 '22 04:06 kuanyui

Thank you for your reply.

i see. I am actually not 100% sure that the longer matched furigana approach works for sure either. It might be better to collect sentences to be tested first. Also, as for the point that the readings of individual decomposed kanji are instructive, I think that perspective is interesting.
related to 1, you mentioned that if multiple furigana are assigned to the same kanji, it would be a good way to study even if some of the furigana candidates are incorrect?
yes. I greatly appreciate your time and input. I am currently finding the alignment process between the original statement in the DOM and the output of iqo.js particularly difficult, but I will investigate the algorithm on my own for a while longer.
regarding your suggestion of adding an option for alpha-phaese testing to options_ui, I would definitely try that. I will try to create a reference implementation with the addition of the longer matched furigana approach mentioned above.