rikaikun Recognize colloquial corruptions such as あい/おい → ええ

This might prove useful to those of us who do much of our immersion with anime: colloquial speech contains many common patterns of corruption, such as the vowel pairs ーあい and ーおい being rendered ーええ (e.g., in stereotypically masculine/toughguy speech).

It occurs to me that this could be detected in much the same way that conjugations of verbs and adjectives are currently detected, perhaps using a tag like "< masc" or something along those lines. There are probably many more patterns than just the two, but those are the two I can remember off the top of my head.

Sep 02 '20 15:09 ChocoChopin

Thanks!

This isn't a bad idea though the two words I hear the most (すげえ　やべえ) are in the dictionary as separate entries already. What are some other examples we can use for testing?

the double え isn't that common otherwise so I don't think false positives would be a problem (though that happens with regular verb congjugations as well).

I will say that this would probably be lower priority than some other stuff in the queue, though I am trying to actually make consistent improvements to rikaikun these days.

Sep 03 '20 00:09 melink14

In my experience, characters that do it will tend to do it with a wide variety of words, and seemingly at random--examples I can think of off the top of my head are しつけえ, おせえ, うるせえ, しらねえ, かっけえ. By far the most common transformation is －ない to －ねえ, so that one detection alone would take care of a lot, but it probably wouldn't be feasible to do other words on a word-by-word basis since there doesn't appear to be any consistent pattern to which words get the treatment; you'd just have to take words ending in ええ, change those endings to あい/おい, and see if they then match existing dictionary entries.

Two characters that make good exemplars of this are Inuyasha and Son Goku. It certainly seems to be a 少年 thing.

On an unrelated note, I see that you added that bit of code to prevent that font issue from happening again. That's awesome, and I really admire your continuing dedication to the project.

Sep 03 '20 12:09 ChocoChopin

Thanks for the extra context. You're right that it needs to be generic; I was spacing on more examples but definitely can think of some.

Adding these directly isn't too bad, but since you actually have to add a mapping for each hiragana ending in あ or お the more sustainable approach would be to set up a script which generates the deinflect.dat based on higher level rules. That will ensure fewer mistakes when updating as well.

I'll make a separate issue for that. Thanks again for the feedback.

Sep 04 '20 00:09 melink14

rikaikun rikaikun copied to clipboard

Recognize colloquial corruptions such as あい/おい → ええ

rikaikun
rikaikun copied to clipboard