limelight icon indicating copy to clipboard operation
limelight copied to clipboard

toHiragana and toKatakana methods are skipping kanji

Open white-miku opened this issue 4 years ago • 3 comments

Hello, I've created a simple page that represents the example from "Getting started". And I've noticed that in my implementation methods toHiragana and toKatakana() are skipping kanji symbols. Result: image

Results:
Words: 庭 で ライム を 育てています。
Readings: ニワデライムヲソダテテイマス。
Pronunciations: ニワデライムヲソダテテイマス。
Lemmas: 庭でライムを育てる。
Parts of speech: noun, postposition, noun, postposition, verb, symbol
Hiragana: 庭でらいむを育てています。
Katakana: 庭デライムヲ育テテイマス。
Romaji: niwa de raimu o sodateteimasu.
Furigana: 庭ニワデライムヲ育ソダテテイマス。

If you want, you may try it yourself: http://jpn.white-miku.me/index.php At the same time readings, pronunciations, romaji and furigana works perfectly. Can it be a bug? Or maybe it is MeCab misconfiguration? Thank you.

white-miku avatar May 14 '20 16:05 white-miku

Hello @white-miku Parsing that works fine for me so I don't think its an issue with the package (could be wrong though). Yo can test your Mecab setup by running Mecab in the command line.

$ mecab
庭 で ライム を 育てています。
庭	名詞,一般,*,*,*,*,庭,ニワ,ニワ
で	助詞,格助詞,一般,*,*,*,で,デ,デ
ライム	名詞,一般,*,*,*,*,ライム,ライム,ライム
を	助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
育て	動詞,自立,*,*,一段,連用形,育てる,ソダテ,ソダテ
て	助詞,接続助詞,*,*,*,*,て,テ,テ
い	動詞,非自立,*,*,一段,連用形,いる,イ,イ
ます	助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
。	記号,句点,*,*,*,*,。,。,。
EOS

If that works, then I'm guessing its your implementation. If you paste you code in I might be able to help out.

zachleigh avatar May 21 '20 13:05 zachleigh

Hello, @zachleigh Thank you for the reply. I've executed the example you provided and output looks similar with yours.

root@White-Miku:~# mecab
庭 で ライム を 育てています。
庭	名詞,一般,*,*,*,*,庭,ニワ,ニワ
で	助詞,格助詞,一般,*,*,*,で,デ,デ
ライム	名詞,一般,*,*,*,*,ライム,ライム,ライム
を	助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
育て	動詞,自立,*,*,一段,連用形,育てる,ソダテ,ソダテ
て	助詞,接続助詞,*,*,*,*,て,テ,テ
い	動詞,非自立,*,*,一段,連用形,いる,イ,イ
ます	助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
。	記号,句点,*,*,*,*,。,。,。
EOS

My implementation is very close to the example provided in Limelight documentation. It's just adapted for Yii2 framework:

	public function process()
	{
		$this->processed = true;
		$limelight = new Limelight();
		$results = $limelight->parse($this->text);

		$this->words = $results->string('word', ' ');
		$this->readings = $results->string('reading');
		$this->pronunciation = $results->string('pronunciation');
		$this->lemma = $results->string('lemma');
		$this->partOfSpeech = $results->string('partOfSpeech', ', ');
		$this->hiragana = $results->toHiragana()->string('word');
		$this->katakana = $results->toKatakana()->string('word');
		$this->romaji = $results->string('romaji', ' ');
		$this->furigana = $results->string('furigana');
	}

white-miku avatar May 26 '20 05:05 white-miku

Me too. Function toHiragana and toKatakana escape the kanji. My result same with @white-miku . Function string('furigana') too, the kanji furigana become katakana not hiragana.

onrsama avatar Aug 27 '20 13:08 onrsama